interlocute.ai beta
For Builders

Same models.
Real workspace.

The substrate for production LLM applications. Everything the runtime layer needs — call recording, memory, streaming, cost attribution, governance — included. No library patchwork required.

The problem with building today

Every LLM app rebuilds
the same infrastructure.

Logging middleware. A context management layer. A token counter. A billing tracker. A streaming handler. A guardrail check. A vector database. A memory store. None of this is specific to your product — but every team building an LLM application builds it from scratch.

By the time the infrastructure is done, the application logic that actually differentiates your product is buried under library integrations, version conflicts, and maintenance surface you didn’t want to own.

Call recording full request + response
Token metering per-call attribution
Streaming SSE, token-by-token
Thread management multi-turn context
Long-term memory cross-session persistence
RAG no vector DB required
Guardrails governance + budget caps
Scheduling cron-style triggers

All of the above — included in every node.

What you get out of the box

The runtime, complete

Everything below is live on a new node. No configuration required, no third-party services to connect, nothing to build.

Streaming SSE

Token-by-token server-sent events. Wire your UI directly to the node endpoint — no streaming handler to build.

Thread management

Multi-turn conversations with isolated state per thread. Context windowing and history managed automatically.

Long-term memory

Cross-thread persistent context. The node remembers users across sessions with no separate memory store to provision.

RAG

Upload documents, get grounded responses. Vector search, chunking, and context injection — no vector database to manage.

Tool use

Pre-configured function calling. Define tools in your node configuration; the runtime handles invocation and result injection.

Scheduling

Cron-style triggered execution. Your node runs on a schedule with no cron server, no queue, and no worker process to manage.

Observability by default

Not bolted on.
Built in.

Production LLM apps fail in ways that are invisible without the right instrumentation. A context window silently exceeded. A tool call that returned nothing. A latency spike in a specific capability. A user whose thread accumulated unexpected cost.

Every node interaction is automatically recorded with the full processing trace — not just the final response. You see what ran, in what order, with what inputs and outputs, at what cost.

Full audit trail details

Capability traces

Every preprocessing step — RAG retrieval, memory lookup, tool calls — logged with inputs and outputs.

Token-level accounting

Input, output, and computation tokens broken down per call — not just aggregate monthly totals.

Latency per call

Time to first token and total response time. Compare models, detect regressions, and investigate spikes.

Governance log

Guardrail refusals, quota hits, and access denials — logged with reason codes for every enforcement action.

Per node
Total spend by AI endpoint
Per thread
Cost of each conversation
Per API key
Usage by customer or integration
Over time
Trends, spikes, and baselines
Cost clarity

Know exactly what
your AI costs.

Usage is metered per call and attributed simultaneously along three dimensions: the node it ran on, the thread it belongs to, and the API key that authorised it. Query any combination to get a complete picture.

For multi-tenant applications, issue a separate API key per customer. Every request is automatically tagged to that key. Chargeback becomes a query, not a spreadsheet exercise.

Who this is for

Built for production, not prototypes

Interlocute is the right substrate when the app you're building has to actually work — reliably, observably, and at real usage volumes.

SaaS builders

Adding AI to a product you ship to customers. You need call records, cost attribution per customer, guardrails, and a runtime that scales without ops work.

Platform and agency builders

Running AI workloads on behalf of multiple clients. Per-key attribution and per-node isolation make chargeback and separation of concerns straightforward.

Internal tooling teams

Deploying AI assistants, automation, or knowledge tools inside an organisation. Governance, audit trails, and budget controls are non-negotiables from day one.

Agent builders

Building autonomous workflows that run on schedules or in response to events. Each node is addressable, configurable, and independently observable.

Integration developers

Connecting LLM capabilities to existing systems — CRMs, support tools, data pipelines. Inbound event triggers (email, SMS, webhook) route directly to a node.

Teams evaluating in production

Running A/B tests on models, prompts, or capability configurations against real traffic. Every call record is exportable for systematic evaluation.

Frequently Asked Questions

For Builders

Is Interlocute a framework?
No. Interlocute is a runtime — a deployment substrate you call into, not a library you build inside of. You keep your application code, your language, and your architecture. A node is an addressable endpoint: your app sends requests to it, the runtime handles the infrastructure layer (recording, streaming, memory, metering, governance), and the response comes back. There is no agent loop to implement, no graph to define, no paradigm to adopt.
Do I still write code?
Yes — your product logic, your data models, your UI, your application flows all live in your codebase as they always have. Interlocute replaces the infrastructure plumbing: you stop writing logging middleware, billing trackers, context management code, and streaming handlers. The things that are the same for every LLM app are done. The things that are specific to your product are yours.
Can I start with one use case and grow?
Yes. Nodes are independent — you create one for the first use case that needs it, and it has no impact on the rest of your system. When a second use case needs the same substrate, you create another node. Each node has its own configuration, memory partition, API keys, and usage ledger. You scale out by adding nodes, not by re-architecting.
Do I need to manage infrastructure?
No servers to run, no vector databases to provision, no message queues to manage, no scaling policies to write. Interlocute is fully managed. The computation, storage, and orchestration that sits behind a node is the platform's concern. You manage nodes through the dashboard or API; the rest is handled.
How does Interlocute relate to Azure or cloud infrastructure?
Interlocute runs on cloud infrastructure and is designed to integrate cleanly with standard application architectures. If your app is already deployed on a cloud provider, Interlocute nodes are additional endpoints your application calls — the same way you call any external API. No cloud-specific SDK is required. Authentication uses standard bearer tokens.
What if I already use LangChain, LlamaIndex, or a similar library?
Those frameworks handle orchestration logic at the application layer. Interlocute operates at the infrastructure layer — below your orchestration code. You can continue using an orchestration library if you need it; the LLM calls those libraries make can be routed through Interlocute nodes instead of directly to the model provider, giving you the full substrate (recording, metering, streaming) without changing your orchestration approach.
How does cost work for a production application?
You pay a small platform premium on LLM tokens plus computation charges — there is no monthly platform fee, no per-seat pricing, and no minimum commitment. Every request is metered per call and attributed to the node, thread, and API key that produced it. For multi-tenant applications, issuing separate API keys per customer gives you a ready-made cost ledger for chargeback. When usage is zero, cost is zero.
Is Interlocute production-ready today?
Interlocute is in v1 beta. The core runtime — chat, streaming, memory, observability, guardrails, and cost attribution — is available and being used in production. Some capabilities are marked coming soon. The API contract is stable for v1 features. Review the docs for the current feature status and any open limitations.

Deploy your first node.

Create a node, configure it for your use case, and call it from your application. The full runtime substrate is live from the first request.