Video Intelligence
Submit a video and receive a structured, searchable index. Three composable profiles — Speech, Visual, and Insights — let you pay only for the signals you need. Combine them freely for full-spectrum analysis.
Speech profile — what was said
The Speech profile extracts a time-aligned transcript with speaker diarization, confidence scores, and automatic language detection. Export captions in VTT, SRT, TTML, or plain text. Each segment includes a speaker ID, so your node knows who said what and when.
Visual profile — what was seen
The Visual profile detects shots and scenes with keyframe extraction, reads on-screen text via OCR, and identifies labels and objects in the video frames. The result is a structured timeline your node can query, cite, and reason over.
Insights profile — what it means
The Insights profile extracts named entities, topics, keywords, and people. It adds time-bounded sentiment and emotion segments, flags audio events (applause, silence, music), and runs content-safety analysis — giving your node a semantic understanding of the entire video.
Composable profiles
Profiles combine freely. Request Speech + Insights for a meeting transcription with sentiment. Request Visual + Insights for a marketing video analysis. Or combine all three for full-spectrum indexing. The platform merges signal sets automatically — you never pay for duplicate work.
AI-generated summaries
Request a textual summary with configurable length (Short, Medium, Long), style (Neutral, Casual, Formal), and custom instructions. Summaries can incorporate keyframe images when a vision-capable model is connected. Submit, poll, retrieve — fully async.
Streaming & playback
Indexed videos produce embeddable player URLs, streaming endpoints, and thumbnail base URLs. Serve preview players to end-users or extract keyframe thumbnails for UI cards — all from a single API call.
Frequently Asked Questions
Video Intelligence
What video formats are supported?
What are the three indexing profiles?
How does profile combination work?
Can I get sentiment or emotion analysis?
How do AI-generated video summaries work?
Is video processing metered?
Can I re-index a video with a different profile?
Documentation
Related Features
Document Processing
Upload PDFs, forms, and scanned documents. Three composable profiles extract text, structural layout, and form fields — combine them to pay only for what you need.
Image Intelligence
Upload an image and get layered AI analysis: a structural fingerprint with instant local metrics, semantic understanding from a multimodal LLM, and full forensic verification with manipulation detection.
RAG (Knowledge Retrieval)
Give your AI nodes access to your own documents and data. Interlocute handles the vector search, chunking, and context injection automatically.
Ready to build with Video Intelligence?
Deploy your node in seconds and start using Video Intelligence today.