Same models.
Real workspace.
The substrate for production LLM applications. Everything the runtime layer needs — call recording, memory, streaming, cost attribution, governance — included. No library patchwork required.
Every LLM app rebuilds
the same infrastructure.
Logging middleware. A context management layer. A token counter. A billing tracker. A streaming handler. A guardrail check. A vector database. A memory store. None of this is specific to your product — but every team building an LLM application builds it from scratch.
By the time the infrastructure is done, the application logic that actually differentiates your product is buried under library integrations, version conflicts, and maintenance surface you didn’t want to own.
All of the above — included in every node.
The runtime, complete
Everything below is live on a new node. No configuration required, no third-party services to connect, nothing to build.
Streaming SSE
Token-by-token server-sent events. Wire your UI directly to the node endpoint — no streaming handler to build.
Thread management
Multi-turn conversations with isolated state per thread. Context windowing and history managed automatically.
Long-term memory
Cross-thread persistent context. The node remembers users across sessions with no separate memory store to provision.
RAG
Upload documents, get grounded responses. Vector search, chunking, and context injection — no vector database to manage.
Tool use
Pre-configured function calling. Define tools in your node configuration; the runtime handles invocation and result injection.
Scheduling
Cron-style triggered execution. Your node runs on a schedule with no cron server, no queue, and no worker process to manage.
Not bolted on.
Built in.
Production LLM apps fail in ways that are invisible without the right instrumentation. A context window silently exceeded. A tool call that returned nothing. A latency spike in a specific capability. A user whose thread accumulated unexpected cost.
Every node interaction is automatically recorded with the full processing trace — not just the final response. You see what ran, in what order, with what inputs and outputs, at what cost.
Full audit trail detailsCapability traces
Every preprocessing step — RAG retrieval, memory lookup, tool calls — logged with inputs and outputs.
Token-level accounting
Input, output, and computation tokens broken down per call — not just aggregate monthly totals.
Latency per call
Time to first token and total response time. Compare models, detect regressions, and investigate spikes.
Governance log
Guardrail refusals, quota hits, and access denials — logged with reason codes for every enforcement action.
Know exactly what
your AI costs.
Usage is metered per call and attributed simultaneously along three dimensions: the node it ran on, the thread it belongs to, and the API key that authorised it. Query any combination to get a complete picture.
For multi-tenant applications, issue a separate API key per customer. Every request is automatically tagged to that key. Chargeback becomes a query, not a spreadsheet exercise.
Built for production, not prototypes
Interlocute is the right substrate when the app you're building has to actually work — reliably, observably, and at real usage volumes.
SaaS builders
Adding AI to a product you ship to customers. You need call records, cost attribution per customer, guardrails, and a runtime that scales without ops work.
Platform and agency builders
Running AI workloads on behalf of multiple clients. Per-key attribution and per-node isolation make chargeback and separation of concerns straightforward.
Internal tooling teams
Deploying AI assistants, automation, or knowledge tools inside an organisation. Governance, audit trails, and budget controls are non-negotiables from day one.
Agent builders
Building autonomous workflows that run on schedules or in response to events. Each node is addressable, configurable, and independently observable.
Integration developers
Connecting LLM capabilities to existing systems — CRMs, support tools, data pipelines. Inbound event triggers (email, SMS, webhook) route directly to a node.
Teams evaluating in production
Running A/B tests on models, prompts, or capability configurations against real traffic. Every call record is exportable for systematic evaluation.
Frequently Asked Questions
For Builders
Is Interlocute a framework?
Do I still write code?
Can I start with one use case and grow?
Do I need to manage infrastructure?
How does Interlocute relate to Azure or cloud infrastructure?
What if I already use LangChain, LlamaIndex, or a similar library?
How does cost work for a production application?
Is Interlocute production-ready today?
Deploy your first node.
Create a node, configure it for your use case, and call it from your application. The full runtime substrate is live from the first request.