Question 1

What exactly is logged for each call?

Accepted Answer

Every node interaction is recorded with: the full request payload, the full response, any tool invocations and their results, input token count, output token count, computation tokens, response latency, the model and provider used, the API key that made the call, the node ID, and the thread ID. Errors and guardrail enforcement actions are also logged with reason codes. You get the complete record — nothing is summarised away.

Question 2

Can I attribute cost by thread, node, or API key?

Accepted Answer

Yes. Every request is metered and attributed along three dimensions simultaneously: the node it ran on, the thread it belongs to, and the API key that authorised it. You can query usage by any of these axes — total spend per node, cost per conversation thread, or usage by integration key. This makes chargeback to teams, customers, or cost centres straightforward.

Question 3

Is this a replacement for LangChain, LlamaIndex, or similar frameworks?

Accepted Answer

No — and intentionally so. Interlocute is a runtime substrate, not an orchestration framework. You do not rewrite your application logic in a new paradigm. Interlocute handles the durable infrastructure layer: call recording, state management, token metering, streaming, and governance. Your application sends requests to a node endpoint; the substrate handles everything around the call. If you are using LangChain or LlamaIndex today, you would point the LLM provider call to an Interlocute node instead.

Question 4

Can I export the call data?

Accepted Answer

Call records are accessible through the dashboard and the API. You can query by node, thread, time range, API key, or model. Programmatic export lets you feed records into your own data warehouse, observability stack, or billing system. Retention and export formats are covered in the docs.

Question 5

What about data retention and privacy?

Accepted Answer

Call records are stored securely and scoped to your account — no other tenant can access your data. Interlocute does not use your request or response content to train models. Retention periods are documented per plan. If your use case requires custom retention or geographic data residency, contact us.

Question 6

Does this add latency to my requests?

Accepted Answer

No measurable user-facing latency is added. The recording pipeline is asynchronous — the response is returned to your caller at the same time as the model finishes generating. Instrumentation runs on the processing side, not in the critical path.

Question 7

How does traceability work with streaming responses?

Accepted Answer

Streaming responses are fully supported. The runtime records the complete token stream after it completes, so you get the full request and response in the log even when the user received it token-by-token. Token counts, latency, and cost attribution are accurate for streamed and non-streamed calls alike.

Question 8

What is the difference between observability and chat history?

Accepted Answer

Chat history is the conversation content a user sees — the messages in a thread. Observability is the infrastructure record of what happened at the runtime level: exact token counts, model call metadata, capability traces (which preprocessing steps ran, in what order), cost, latency, and any errors. Chat history is for the user experience. Observability is for you as the operator — debugging, auditing, and cost management.

Question 9

Can I use this for compliance or audit requirements?

Accepted Answer

Yes. Every interaction has a durable, timestamped record including the full request, response, and the governance policies that were applied. If your organisation needs to demonstrate what the model was asked, what it responded, and what guardrails were in place — that record exists and is queryable. For regulated industries or specific compliance frameworks, review the security and data docs.

Question 10

How does cost attribution work with multi-tenant apps?

Accepted Answer

If you issue separate API keys per customer or team, usage is automatically attributed to those keys. Querying by API key gives you a usage ledger for each key holder. Combined with per-thread and per-node attribution, you can reconstruct the complete cost breakdown for any customer's usage across any time window.

Every AI call.
On the record.

Raw APIs give you a response.
Nothing else.

No durable call record

Token counts you can't attribute

No trail for debugging

Nothing audit-ready

The complete call record, every time

Full request & response

Token counts

Latency

Capability traces

Per-call cost

Attribution

Errors & refusals

Model & provider

Timestamps

What you can do with a complete record

Debug confidently

Charge back accurately

Satisfy compliance

Evaluate systematically

You bring the meaning.
We keep the record.

You don’t rewrite your app.
You route your calls differently.

No framework lock-in

Drop-in endpoint swap

Substrate, not scaffolding

Frequently Asked Questions

Start with a record.

Every AI call.On the record.

Raw APIs give you a response.Nothing else.