interlocute.ai beta
Production Substrate

Every AI call.
On the record.

Raw model APIs return a response and forget it happened. Interlocute turns every call into a durable, inspectable, attributable record — automatically, with nothing to configure.

The problem

Raw APIs give you a response.
Nothing else.

You call the model. It responds. That's it. There's no durable record of what was sent, no token count you can trust, no cost attribution, and no trace of the processing that happened in between. If something goes wrong — or if your finance team asks what the AI spend was last month — you have nothing to show.

Bolting observability on afterwards means integrating a separate logging layer, a billing system, a tracing tool, and a storage backend. Each adds maintenance surface and still doesn't give you the full picture.

No durable call record

The model API returns a response object. It's in memory. If you don't log it yourself, it's gone.

Token counts you can't attribute

Usage data from the provider is aggregate. You can't tell which thread, node, customer, or feature drove the cost.

No trail for debugging

When a user reports a bad response, there's no record of what prompt was actually sent, what context was included, or what tools ran.

Nothing audit-ready

Compliance, governance, and chargeback all require records that simply don't exist if you're calling the provider API directly.

Captured automatically

The complete call record, every time

Everything below is recorded for every node interaction with no configuration and no logging code to write.

Full request & response

The exact prompt sent to the model, including composed context, system instructions, and the full response — not a summary.

Token counts

Input tokens, output tokens, and computation tokens — broken down per call, not just aggregate monthly totals from the provider.

Latency

Time to first token and total response time, per call. Identify slow patterns, compare models, and catch regressions.

Capability traces

Which capabilities ran on each call — RAG retrieval, memory lookup, tool invocations — with inputs and outputs for each step.

Per-call cost

Exact cost calculated per interaction — not estimated. Attributed to the node, thread, and API key simultaneously.

Attribution

Node ID, thread ID, and API key recorded on every call. Query any dimension to reconstruct usage for any customer, team, or feature.

Errors & refusals

Failed calls, provider errors, guardrail refusals, and quota hits — all logged with reason codes and the triggering request.

Model & provider

Which model and provider served each response. Compare costs and quality across providers over time from the same data.

Timestamps

Precise request and response timestamps. Reconstruct the exact sequence of events for incident review or compliance audits.

Why it matters

What you can do with a complete record

01

Debug confidently

When a user reports a bad response, open the call record. See the exact prompt that was sent, the context that was injected, which tools ran, and the full response. Reproduce the issue in isolation. No guessing.

02

Charge back accurately

Multi-tenant apps, agencies, and platform builders can attribute exact costs to customers or teams using per-key and per-node records. No more estimated allocations — real numbers per customer.

03

Satisfy compliance

Every response has a durable, timestamped record of the request, the model, and the governance policies applied. Auditors and regulators can be shown exactly what happened, when, and under what controls.

04

Evaluate systematically

Export call records to your evaluation pipeline. Compare quality across model versions, prompt changes, or RAG configurations using real production data — not synthetic test sets.

How it fits

You bring the meaning.
We keep the record.

Interlocute doesn’t know what a “good” response looks like for your product. That’s application logic — it belongs in your codebase.

What Interlocute provides is the durable substrate: every call is recorded before it has meaning, so you have the raw material to evaluate, attribute, and govern it once your application assigns that meaning. The record is always there; what you do with it is up to you.

This is the right separation of concerns. The runtime handles persistence, metering, and traceability. Your code handles product logic.

call record (simplified)
"callId": "a3f8...",
"nodeId": "support-bot",
"threadId": "thread-uuid",
"apiKey": "key_customer_acme",
"model": "gpt-4o",
"inputTokens": 1842,
"outputTokens": 312,
"latencyMs": 1204,
"cost": 0.00218,
"capabilities": ["rag", "memory"],
"guardrailsApplied": true,
"timestamp": "2025-06-15T09:12:04Z"
Not another framework

You don’t rewrite your app.
You route your calls differently.

Interlocute is a runtime, not a framework. There is no new SDK to learn, no graph API to define, no agent loop to implement. Your application sends requests to a node endpoint instead of directly to the model provider. The substrate handles recording, metering, streaming, memory, and governance. The rest of your codebase is unchanged.

No framework lock-in

Your application logic stays in your language, your stack, your repository. No paradigm shift required.

Drop-in endpoint swap

Point your LLM call at a node endpoint. Everything you had before still works; now it also has a record.

Substrate, not scaffolding

Traceability, metering, and governance are infrastructure concerns. They belong in the runtime layer, not your application code.

Frequently Asked Questions

Traceability & Cost Attribution

What exactly is logged for each call?
Every node interaction is recorded with: the full request payload, the full response, any tool invocations and their results, input token count, output token count, computation tokens, response latency, the model and provider used, the API key that made the call, the node ID, and the thread ID. Errors and guardrail enforcement actions are also logged with reason codes. You get the complete record — nothing is summarised away.
Can I attribute cost by thread, node, or API key?
Yes. Every request is metered and attributed along three dimensions simultaneously: the node it ran on, the thread it belongs to, and the API key that authorised it. You can query usage by any of these axes — total spend per node, cost per conversation thread, or usage by integration key. This makes chargeback to teams, customers, or cost centres straightforward.
Is this a replacement for LangChain, LlamaIndex, or similar frameworks?
No — and intentionally so. Interlocute is a runtime substrate, not an orchestration framework. You do not rewrite your application logic in a new paradigm. Interlocute handles the durable infrastructure layer: call recording, state management, token metering, streaming, and governance. Your application sends requests to a node endpoint; the substrate handles everything around the call. If you are using LangChain or LlamaIndex today, you would point the LLM provider call to an Interlocute node instead.
Can I export the call data?
Call records are accessible through the dashboard and the API. You can query by node, thread, time range, API key, or model. Programmatic export lets you feed records into your own data warehouse, observability stack, or billing system. Retention and export formats are covered in the docs.
What about data retention and privacy?
Call records are stored securely and scoped to your account — no other tenant can access your data. Interlocute does not use your request or response content to train models. Retention periods are documented per plan. If your use case requires custom retention or geographic data residency, contact us.
Does this add latency to my requests?
No measurable user-facing latency is added. The recording pipeline is asynchronous — the response is returned to your caller at the same time as the model finishes generating. Instrumentation runs on the processing side, not in the critical path.
How does traceability work with streaming responses?
Streaming responses are fully supported. The runtime records the complete token stream after it completes, so you get the full request and response in the log even when the user received it token-by-token. Token counts, latency, and cost attribution are accurate for streamed and non-streamed calls alike.
What is the difference between observability and chat history?
Chat history is the conversation content a user sees — the messages in a thread. Observability is the infrastructure record of what happened at the runtime level: exact token counts, model call metadata, capability traces (which preprocessing steps ran, in what order), cost, latency, and any errors. Chat history is for the user experience. Observability is for you as the operator — debugging, auditing, and cost management.
Can I use this for compliance or audit requirements?
Yes. Every interaction has a durable, timestamped record including the full request, response, and the governance policies that were applied. If your organisation needs to demonstrate what the model was asked, what it responded, and what guardrails were in place — that record exists and is queryable. For regulated industries or specific compliance frameworks, review the security and data docs.
How does cost attribution work with multi-tenant apps?
If you issue separate API keys per customer or team, usage is automatically attributed to those keys. Querying by API key gives you a usage ledger for each key holder. Combined with per-thread and per-node attribution, you can reconstruct the complete cost breakdown for any customer's usage across any time window.

Start with a record.

Create a node, route your LLM calls through it, and every call has a complete, durable record from day one. No logging pipeline to build.