Every AI call.
On the record.
Raw model APIs return a response and forget it happened. Interlocute turns every call into a durable, inspectable, attributable record — automatically, with nothing to configure.
Raw APIs give you a response.
Nothing else.
You call the model. It responds. That's it. There's no durable record of what was sent, no token count you can trust, no cost attribution, and no trace of the processing that happened in between. If something goes wrong — or if your finance team asks what the AI spend was last month — you have nothing to show.
Bolting observability on afterwards means integrating a separate logging layer, a billing system, a tracing tool, and a storage backend. Each adds maintenance surface and still doesn't give you the full picture.
No durable call record
The model API returns a response object. It's in memory. If you don't log it yourself, it's gone.
Token counts you can't attribute
Usage data from the provider is aggregate. You can't tell which thread, node, customer, or feature drove the cost.
No trail for debugging
When a user reports a bad response, there's no record of what prompt was actually sent, what context was included, or what tools ran.
Nothing audit-ready
Compliance, governance, and chargeback all require records that simply don't exist if you're calling the provider API directly.
The complete call record, every time
Everything below is recorded for every node interaction with no configuration and no logging code to write.
Full request & response
The exact prompt sent to the model, including composed context, system instructions, and the full response — not a summary.
Token counts
Input tokens, output tokens, and computation tokens — broken down per call, not just aggregate monthly totals from the provider.
Latency
Time to first token and total response time, per call. Identify slow patterns, compare models, and catch regressions.
Capability traces
Which capabilities ran on each call — RAG retrieval, memory lookup, tool invocations — with inputs and outputs for each step.
Per-call cost
Exact cost calculated per interaction — not estimated. Attributed to the node, thread, and API key simultaneously.
Attribution
Node ID, thread ID, and API key recorded on every call. Query any dimension to reconstruct usage for any customer, team, or feature.
Errors & refusals
Failed calls, provider errors, guardrail refusals, and quota hits — all logged with reason codes and the triggering request.
Model & provider
Which model and provider served each response. Compare costs and quality across providers over time from the same data.
Timestamps
Precise request and response timestamps. Reconstruct the exact sequence of events for incident review or compliance audits.
What you can do with a complete record
Debug confidently
When a user reports a bad response, open the call record. See the exact prompt that was sent, the context that was injected, which tools ran, and the full response. Reproduce the issue in isolation. No guessing.
Charge back accurately
Multi-tenant apps, agencies, and platform builders can attribute exact costs to customers or teams using per-key and per-node records. No more estimated allocations — real numbers per customer.
Satisfy compliance
Every response has a durable, timestamped record of the request, the model, and the governance policies applied. Auditors and regulators can be shown exactly what happened, when, and under what controls.
Evaluate systematically
Export call records to your evaluation pipeline. Compare quality across model versions, prompt changes, or RAG configurations using real production data — not synthetic test sets.
You bring the meaning.
We keep the record.
Interlocute doesn’t know what a “good” response looks like for your product. That’s application logic — it belongs in your codebase.
What Interlocute provides is the durable substrate: every call is recorded before it has meaning, so you have the raw material to evaluate, attribute, and govern it once your application assigns that meaning. The record is always there; what you do with it is up to you.
This is the right separation of concerns. The runtime handles persistence, metering, and traceability. Your code handles product logic.
You don’t rewrite your app.
You route your calls differently.
Interlocute is a runtime, not a framework. There is no new SDK to learn, no graph API to define, no agent loop to implement. Your application sends requests to a node endpoint instead of directly to the model provider. The substrate handles recording, metering, streaming, memory, and governance. The rest of your codebase is unchanged.
No framework lock-in
Your application logic stays in your language, your stack, your repository. No paradigm shift required.
Drop-in endpoint swap
Point your LLM call at a node endpoint. Everything you had before still works; now it also has a record.
Substrate, not scaffolding
Traceability, metering, and governance are infrastructure concerns. They belong in the runtime layer, not your application code.
Frequently Asked Questions
Traceability & Cost Attribution
What exactly is logged for each call?
Can I attribute cost by thread, node, or API key?
Is this a replacement for LangChain, LlamaIndex, or similar frameworks?
Can I export the call data?
What about data retention and privacy?
Does this add latency to my requests?
How does traceability work with streaming responses?
What is the difference between observability and chat history?
Can I use this for compliance or audit requirements?
How does cost attribution work with multi-tenant apps?
Start with a record.
Create a node, route your LLM calls through it, and every call has a complete, durable record from day one. No logging pipeline to build.