Chat API Reference | Docs | Interlocute.ai

Overview

The POST /chat endpoint is the primary interface for sending messages to an Interlocute node and receiving responses. It supports four response modes: buffered JSON, real-time streaming (SSE), async (immediate background), and scheduled.

Every node is addressable in two ways:

POST https://YOUR-NODE-ALIAS.interlocute.ai/chat

Subdomain-based routing. The node is resolved from the subdomain automatically.

POST https://api.interlocute.ai/nodes/{nodeId}/chat

Path-based routing. Pass the node ID directly in the URL.

Both routes accept the same request body and return the same response format.

Authentication

The /chat endpoint supports three authentication modes:

Tenant API Key

Pass your API key as a Bearer token: Authorization: Bearer YOUR_API_KEY. Tenant-scoped keys can access any node owned by the tenant. Node-scoped keys are restricted to a single node.

JWT Token

For browser-based or first-party integrations. The JWT is validated by middleware before the request reaches the node.

Anonymous

If the node operator has enabled anonymous chat, no credentials are required. Anonymous access is disabled by default and must be explicitly enabled per node.

Node-scoped API keys add a layer of isolation: even if a key is compromised, it can only access the single node it was issued for. A 403 Forbidden is returned if a node-scoped key is used against a different node. See Auth & Keys for key management details.

Request

Send a JSON body with Content-Type: application/json.

Field	Type	Required	Description
content	string	Yes	The user's message text.
threadId	string \| null	No	Omit or pass `null` to start a new thread. Pass an existing thread ID to continue a conversation.
externalCorrelationId	string \| null	No	Client-supplied alias for thread resolution, scoped per node. When provided (and `threadId` is null), the system looks for an existing thread with this alias. If found it is reused; if not, a new thread is created and stamped with the alias. Enables sticky sessions without persisting Interlocute thread GUIDs on your side. Max 256 chars, alphanumeric + `-_.:/`.
clientMessageId	string \| null	No	Client-supplied identifier for the assistant reply. Useful for hydrating prompt metadata without fetching the full thread.
quotedContexts	array \| null	No	Quoted context references scoped to this turn. See Quoted Contexts below.
attachments	array \| null	No	File attachments (images, documents) as base64 data URLs. See Attachments below.
options	object \| null	No	Per-request options: response mode, web search, geolocation, and reasoning effort. See Options below.

Minimal request

{
  "content": "Hello! How can you help me today?"
}

Full request

{
  "content": "Summarize the highlighted paragraph.",
  "threadId": "thr_abc123",
  "clientMessageId": "msg_client_456",
  "externalCorrelationId": "session-user-42",
  "quotedContexts": [
    {
      "sourceMessageId": "msg_789",
      "sourceRole": "assistant",
      "quotedText": "The deployment completed successfully at 14:32 UTC.",
      "sourceTimestamp": "2025-01-15T14:32:00Z"
    }
  ],
  "attachments": [
    {
      "name": "screenshot.png",
      "contentType": "image/png",
      "dataUrl": "data:image/png;base64,iVBORw0KGgo...",
      "sizeBytes": 24576
    }
  ],
  "options": {
    "webSearchEnabled": true,
    "searchContextSize": "medium"
  }
}

Options

The options object provides per-request control over response delivery, model tools, and execution behaviour. All fields are optional and default to off or null.

Response mode

Field	Type	Default	Description
responseMode	string \| null	`null`	Controls how the response is delivered. All non-sync modes return `202 Accepted` with an `invocationId` for polling. See Async modes. `"sync"` or `null`: default synchronous. `"async"`: immediate background processing. `"scheduled"`: process at `scheduledAtUtc`. `"economy"`: batched, low-cost (coming soon).
scheduledAtUtc	string (ISO 8601) \| null	null	UTC time to process the message. Required when `responseMode` is `"Scheduled"`. Must be in the future and within 7 days. Ignored for other modes.

Web search

When webSearchEnabled is true, the web_search_preview built-in tool is attached to the request. The model may search the web and cite results inline in its response. Only effective when the node's provider supports built-in tools (currently OpenAIResponses).

Field	Type	Default	Description
webSearchEnabled	boolean	`false`	Enables the `web_search_preview` tool for this message. The model may search the web and include citations in its response.
searchContextSize	string \| null	`"medium"`	Controls how much web context the model retrieves per search. `"low"` is faster and cheaper; `"high"` pulls more results for comprehensive answers. Only applies when `webSearchEnabled` is `true`.
useGeolocation	boolean	`false`	When `true`, the `user_location` parameter is set on the web search tool, geo-biasing results toward the user's location. Pair with the `userCity`/`userRegion`/`userCountry` fields for accuracy.
userCity	string \| null	null	Approximate city for geo-biasing (e.g., `"New York"`). Typically derived from the user's browser timezone. Only used when `useGeolocation` is `true`.
userRegion	string \| null	null	Approximate region or state (e.g., `"New York"`, `"California"`). Only used when `useGeolocation` is `true`.
userCountry	string \| null	null	ISO 3166-1 alpha-2 country code (e.g., `"US"`, `"GB"`). Only used when `useGeolocation` is `true`.

Web search availability depends on the node's execution profile. Nodes using providers that don't support built-in tools (such as AzureOpenAI) will silently ignore webSearchEnabled.

// Minimal web search
{
  "content": "What is the current status of the SpaceX Starship program?",
  "options": {
    "webSearchEnabled": true
  }
}

// With geo-biasing (city/region/country from browser timezone)
{
  "content": "What are the best coffee shops near me?",
  "options": {
    "webSearchEnabled": true,
    "searchContextSize": "low",
    "useGeolocation": true,
    "userCity": "Seattle",
    "userRegion": "Washington",
    "userCountry": "US"
  }
}

Quoted Contexts

Quoted contexts let you reference specific content from earlier in the conversation (or from another source) to give the node precise context for the current turn. This is especially useful for "reply to this" or "explain this paragraph" interactions.

Field	Type	Description
sourceMessageId	string	The ID of the message being quoted.
sourceRole	string	Role of the quoted message: `user` or `assistant`.
quotedText	string	The highlighted or selected text being quoted.
sourceTimestamp	string (ISO 8601)	When the original message was created.

Quoted contexts require the node's prompt configuration to include quoted context support. If the node does not have this capability enabled, the quoted contexts are accepted but may not influence the response.

Attachments

Attachments let you send files (images, documents, etc.) alongside your message. Files are sent inline as base64-encoded data URLs.

Field	Type	Description
name	string	The file name (e.g., `report.pdf`).
contentType	string	MIME type (e.g., `image/png`, `application/pdf`).
dataUrl	string	Base64-encoded data URI: `data:{contentType};base64,...`
sizeBytes	integer	File size in bytes (before base64 encoding).

Attachments are transmitted inline. Keep file sizes reasonable — large files increase request latency and may hit request body limits. Supported file types depend on the underlying model's capabilities.

Response modes

The /chat endpoint supports four response modes. Sync and streaming are selected by the Accept header; async modes are selected via options.responseMode.

Buffered JSON (default)

The default mode. The server waits for the complete response, then returns a single JSON object. No special headers are required.

curl -X POST https://my-node.interlocute.ai/chat \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"content": "Hello!"}'

Streaming (SSE)

Set Accept: text/event-stream to receive tokens as they are generated. The response is a Server-Sent Events stream with the following event sequence:

curl -N -X POST https://my-node.interlocute.ai/chat \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -H "Accept: text/event-stream" \
  -d '{"content": "Write a haiku about AI."}'

1

[META] event

Sent first. Contains a JSON payload with requestId, nodeId, threadId, inputMessageId, and outputMessageId. The content field is empty in this event.

2

data: token events

Each generated token is sent as a data: line. Concatenate all tokens to build the complete response.

3

[DONE] event

Sent last. Indicates the stream is complete. Close your connection after receiving this event.

!

[ERROR] event

Sent if an error occurs during streaming. Contains a JSON payload with an error field. The stream ends after this event.

Example SSE stream

data: [META]{"requestId":"req_abc","nodeId":"nd_123","threadId":"thr_456","inputMessageId":"msg_in_1","outputMessageId":"msg_out_1","content":""}

data: Silicon
data:  dreams
data:  awake
data: ,
data:  thoughts
data:  bloom
data:  like
data:  spring

data: [DONE]

Async modes (Async, Scheduled, Economy)

All non-sync modes are deferred in protocol terms: the server validates the request, resolves (or creates) the thread, then returns 202 Accepted immediately with an invocationId you can poll for status and output. What differs is when and how processing happens.

Mode	When it runs	Use case	Status
Async	Immediately (background worker)	Avoid timeouts, fire-and-forget, batch	Available
Scheduled	At `scheduledAtUtc`	"Send this at 9 AM", delayed campaigns	Available
Economy	Batched (up to 24 h)	Cost-optimized bulk processing (~50% savings)	Coming soon

Async — immediate background processing

curl -X POST https://my-node.interlocute.ai/chat \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "content": "Analyze this quarter's revenue trends in detail.",
    "options": { "responseMode": "async" }
  }'

Scheduled — process at a future time

curl -X POST https://my-node.interlocute.ai/chat \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "content": "Good morning! Here is your daily briefing.",
    "options": {
      "responseMode": "scheduled",
      "scheduledAtUtc": "2025-07-16T09:00:00Z"
    }
  }'

202 response (all async modes)

{
  "disposition": "deferred",
  "invocationId": "01JX4K7M2N...",
  "nodeId": "nd_abc123",
  "threadId": "thr_xyz789",
  "status": "queued",
  "responseMode": "scheduled",
  "scheduledAtUtc": "2025-07-16T09:00:00Z",
  "pollUrl": "nodes/nd_abc123/chit/01JX4K7M2N..."
}

The responseMode and scheduledAtUtc fields are echoed in the 202 response. For Async mode, scheduledAtUtc is omitted.

Polling for results

Use the pollUrl from the 202 response to check status. The polling endpoint is GET /chit/{invocationId} — part of the node's discovery surface — so no separate polling URL scheme is needed.

# JSON (default) — full invocation receipt
curl https://my-node.interlocute.ai/chit/01JX4K7M2N... \
  -H "Authorization: Bearer $API_KEY"

# Plain text — single human-readable sentence
curl "https://my-node.interlocute.ai/chit/01JX4K7M2N...?format=plain" \
  -H "Authorization: Bearer $API_KEY"

# Markdown — compact status table
curl "https://my-node.interlocute.ai/chit/01JX4K7M2N...?format=markdown" \
  -H "Authorization: Bearer $API_KEY"

The JSON response is an invocation receipt with status (queued, running, completed, failed), timing fields, and an outputs[] array containing message IDs and thread references once processing completes.

// Completed receipt (abbreviated)
{
  "invocationId": "01JX4K7M2N...",
  "status": "completed",
  "threadId": "thr_xyz789",
  "startedAtUtc": "2025-07-15T10:00:01Z",
  "completedAtUtc": "2025-07-15T10:00:08Z",
  "durationMs": 7200,
  "outputs": [
    { "kind": "userMessage", "outputId": "msg_in_001", "threadId": "thr_xyz789" },
    { "kind": "assistantMessage", "outputId": "msg_out_001", "threadId": "thr_xyz789" }
  ]
}

Poll with a reasonable interval — we recommend starting at 1 second, backing off to 3–5 seconds. Most single-turn requests complete in under 15 seconds. For Scheduled mode, don't start polling until after scheduledAtUtc.

When to use async modes

Non-streaming JSON callers behind proxies → Async

Buffered JSON holds the connection open until the full response is ready. Behind Azure Front Door (240 s timeout) or typical HTTP clients (30–60 s), complex prompts risk a gateway timeout. Async mode returns in <1 second.

Webhook and serverless integrations → Async

Zapier, n8n, and cloud functions often enforce short timeouts. Submit work via Async, then poll or process asynchronously.

Delayed messages and campaigns → Scheduled

"Send this daily briefing at 9 AM", "Process this report Monday morning". Set scheduledAtUtc and the message is held until that time. Maximum 7 days ahead.

Batch and fire-and-forget workflows → Async

Submit multiple requests, collect invocation IDs, and poll for results later — ideal for background processing pipelines.

Cost-optimized bulk work → Economy (coming soon)

Routes to the OpenAI Batch API at ~50% cost reduction. Higher latency (up to 24 h), significantly lower cost. Ideal for offline analysis, bulk summarization, non-urgent work.

Timeout guidance: If you are calling the buffered JSON mode (no streaming) from a server-side integration, consider using Async mode by default. The Interlocute Runtime API sits behind Azure Front Door, which enforces a 240-second idle timeout. Complex prompts with web search, large context windows, or reasoning models can approach this limit. Async mode eliminates the risk entirely.

Streaming (SSE) mode is not affected — it sends tokens continuously, keeping the connection alive.

Response schema

The buffered JSON response (and the [META] event in streaming mode) follows this structure:

Field	Type	Description
requestId	string	Server-generated correlation ID for tracing and support.
nodeId	string	The node that processed this request.
threadId	string	The thread ID for this conversation. Save this to continue the thread in subsequent requests.
inputMessageId	string	The ID assigned to your user message.
outputMessageId	string	The ID assigned to the assistant's reply.
content	string	The assistant's full response text. Empty in streaming `[META]` events.
usage	object \| null	Token usage breakdown (when available). Contains `inputTokens` and `outputTokens`.

Example response (buffered)

{
  "requestId": "req_a1b2c3d4",
  "nodeId": "nd_abc123",
  "threadId": "thr_xyz789",
  "inputMessageId": "msg_in_001",
  "outputMessageId": "msg_out_001",
  "content": "Hello! I'm your support assistant. I can help you with order lookups, account questions, and troubleshooting.",
  "usage": {
    "inputTokens": 12,
    "outputTokens": 28
  }
}

Thread lifecycle

Threads are the unit of conversation state. Understanding how they work helps you build multi-turn integrations.

New thread

Omit threadId (or pass null). A new thread is created automatically. The response includes the new threadId — save it to continue the conversation.

Continue a thread

Pass an existing threadId. The node resumes the conversation with full history context. The thread must belong to the same tenant and node.

Validation

If the provided threadId doesn't exist, or belongs to a different tenant or node, the request fails with 404 Not Found.

Store the threadId from the first response and pass it in all subsequent messages. This is the standard pattern for multi-turn conversations.

Error responses

Errors are returned in the RFC 7807 Problem Details format:

{
  "type": "https://tools.ietf.org/html/rfc9110#section-15.5.1",
  "title": "Bad Request",
  "detail": "Message is required.",
  "status": 400
}

Status	Meaning	Common causes
400	Bad Request	Missing `content`/`message`, empty body, or invalid JSON.
401	Unauthorized	Missing or invalid API key / JWT token (and node does not allow anonymous access).
403	Forbidden	Node-scoped API key used against a different node than the one it was issued for.
404	Not Found	Node not found, chat not enabled on this node, thread not found, or thread tenant/node mismatch.
429	Too Many Requests	Prepaid credit balance exhausted. Top up your account to resume usage.
500	Internal Server Error	Provider error or unexpected runtime failure. Include the `requestId` in support requests.

Retry on 5xx errors with exponential backoff. Do not retry 4xx errors without fixing the request. During streaming, errors are delivered as [ERROR] SSE events instead of HTTP status codes (since headers have already been sent).

Credit enforcement

Interlocute uses a prepaid credit model. Before a chat request is processed:

Reserve — credits are estimated and reserved based on message length and expected output.
Execute — the node processes the request.
Finalize — actual token usage is reconciled. Overestimates are refunded; underestimates are adjusted.

If your credit balance is insufficient, the request is rejected with 429 Too Many Requests before any processing occurs. Credits are automatically refunded if the request fails or the client disconnects mid-stream.

Examples

Minimal: new thread

curl -X POST https://my-node.interlocute.ai/chat \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "content": "Hello! What can you do?"
  }'

Continue a thread with streaming

curl -N -X POST https://my-node.interlocute.ai/chat \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -H "Accept: text/event-stream" \
  -d '{
    "content": "Tell me more about that.",
    "threadId": "thr_abc123"
  }'

Full request: attachments + quoted contexts

curl -X POST https://my-node.interlocute.ai/chat \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "content": "What does the highlighted section of this document mean?",
    "threadId": "thr_abc123",
    "clientMessageId": "my-client-id-001",
    "quotedContexts": [
      {
        "sourceMessageId": "msg_prev_reply",
        "sourceRole": "assistant",
        "quotedText": "Revenue grew 15% quarter-over-quarter.",
        "sourceTimestamp": "2025-01-10T09:00:00Z"
      }
    ],
    "attachments": [
      {
        "name": "q4-report.pdf",
        "contentType": "application/pdf",
        "dataUrl": "data:application/pdf;base64,JVBERi0xLjQK...",
        "sizeBytes": 102400
      }
    ]
  }'

Path-based routing (explicit node ID)

curl -X POST https://api.interlocute.ai/nodes/nd_abc123/chat \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "content": "Hello from path-based routing!"
  }'

Async: submit and poll

# 1. Submit (returns 202 immediately)
RESPONSE=$(curl -s -X POST https://my-node.interlocute.ai/chat \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "content": "Generate a detailed market analysis for Q3.",
    "externalCorrelationId": "batch-job-42",
    "options": { "responseMode": "async" }
  }')

# 2. Extract invocation ID
INVOCATION_ID=$(echo $RESPONSE | jq -r '.invocationId')

# 3. Poll via /chit until completed
curl https://my-node.interlocute.ai/chit/$INVOCATION_ID \
  -H "Authorization: Bearer $API_KEY"

Scheduled: send a message at a future time

curl -X POST https://my-node.interlocute.ai/chat \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "content": "Good morning! Here is your daily briefing.",
    "threadId": "thr_abc123",
    "options": {
      "responseMode": "scheduled",
      "scheduledAtUtc": "2025-07-16T09:00:00Z"
    }
  }'

Triggers & automated invocations

The same chat processing pipeline powers scheduled and event-driven triggers. When a trigger fires, it invokes the node's chat capability with a system-generated message. Triggers support several thread modes that control how conversations are organized:

New thread per run — each trigger execution creates a fresh thread
Singleton per trigger — all executions of a trigger share a single, long-lived thread
Fixed thread ID — the trigger always targets a specific, pre-existing thread

Deferred chat, triggers, and synchronous API calls all share the same invocation receipt infrastructure. Every execution — regardless of mode — appears in the invocations log with a source label (api-chat, schedule, event, etc.) and full attribution (API key name, IP, user agent). Use the invocationId or requestId to correlate any execution back to its source.

Which response mode should I use?

Scenario	Recommended mode	Why
Interactive chat UI (browser)	Streaming (SSE)	Users see tokens appear in real-time. Connection stays alive via continuous data flow.
Simple server-to-server, fast prompts	Buffered JSON	Single JSON response, easy to parse. Fine when prompts complete in <30 s.
Server-to-server, complex/long prompts	Async	Avoids proxy timeouts (Azure Front Door: 240 s). Returns in <1 s, poll for result.
Webhook / Zapier / n8n	Async	Short platform timeouts. Submit, get ID, process result asynchronously.
Batch processing	Async	Submit N requests, collect IDs, poll all. No connections held open.
Delayed messages ("send at 9 AM")	Scheduled	Message is held until `scheduledAtUtc`. Max 7 days.
Mobile / unreliable network	Async	Resilient to network drops. Poll when reconnected.
Non-urgent bulk summarization	Economy (soon)	~50% cost reduction via OpenAI Batch API. Up to 24 h latency.

Next steps

Chit API Reference — deterministic node information sheets
API Examples — copy-paste starters in cURL, C#, and JavaScript
Auth & Keys — credential setup and key scoping
Triggers — scheduled and event-driven execution