How to Use Video Summarization API
Send a video URL or file to our REST endpoint and receive a JSON response with transcription, summary, and timestamps.
curl -X POST https://api.screenapp.io/v1/summarize \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{"video_url": "https://youtube.com/watch?v=..."}'
The API returns structured JSON with speaker labels, timestamped highlights, and a concise summary in under 2 seconds per minute of video. You can also batch process up to 100 videos in a single request.
YouTube Summarizer API Integration
Process YouTube videos without downloading them. The API accepts YouTube URLs and extracts transcripts with timestamps automatically.
{
"video_url": "https://youtube.com/watch?v=dQw4w9WgXcQ",
"include_timestamps": true,
"summary_length": "medium"
}
Returns speaker-labeled segments, key moments, and a structured summary ready for display in your app. Supports videos up to 60 minutes in the free tier.
Who This Video Summary API Is For
SaaS developers building meeting intelligence, podcast platforms, or learning management systems that need automatic transcription and summarization.
Media monitoring teams processing hundreds of webinars, news clips, or social videos daily who need scalable batch processing.
Content operations managers creating searchable video archives with metadata, timestamps, and summaries for internal knowledge bases.
Customer support leads analyzing support call recordings to identify common pain points and training opportunities without manual review.
Benefits of Video Summary API
Reduce video processing time by 95%. A 30-minute webinar produces a summary in 60 seconds instead of requiring manual watching and note-taking.
Get structured output ready for your database. JSON responses include confidence scores, speaker IDs, timestamps, and segment-level summaries that map directly to your data models.
Scale to thousands of videos without infrastructure changes. Batch processing handles 100 videos per request with automatic retries and webhook notifications when complete.
Save on LLM costs. Pre-processed transcripts with speaker diarization reduce token usage by 40% compared to sending raw transcripts to ChatGPT or Claude.
Video Summarization API vs ChatGPT Integration
| Feature | ScreenApp API | Raw Transcript to ChatGPT |
|---|---|---|
| Speaker diarization | Automatic with labels | Manual preprocessing required |
| Timestamp accuracy | Frame-level precision | Approximate or missing |
| Batch processing | 100 videos per request | One at a time |
| Cost per 30-min video | $0.60 (transcription + summary) | $2.40 (raw transcript tokens) |
| Processing time | 60 seconds | 3-5 minutes |
| Output format | Structured JSON with metadata | Plain text requiring parsing |
| Video frame analysis | Included (OCR, slide detection) | Not available |
| API integration | Single endpoint | Multiple services to orchestrate |
ChatGPT and Claude work well for short, clean transcripts. For production video processing with speaker labels, timestamps, and cost efficiency, a dedicated API saves 60% on token costs and eliminates chunking complexity.
API Pricing Comparison
| Provider | Price per Minute | Free Tier | Batch Processing | Speaker Diarization | Timestamp Precision |
|---|---|---|---|---|---|
| ScreenApp | $0.020 | 60 min/month | ✓ 100 videos/request | Included | Frame-level |
| Twelve Labs | $0.033 | 10 min trial | ✗ | Included | Segment-level |
| AssemblyAI | $0.025 | None | ✗ | +$0.005/min extra | Segment-level |
| Deepgram | $0.022 | 45 min trial | ✗ | +$0.004/min extra | Word-level |
| YouTLDR | $4/month flat | None | ✗ | Not available | Not available |
| Google Video Intelligence | $0.030 | $300 credit | Via Cloud Tasks | Separate service | Shot-level |
| AWS Transcribe + Bedrock | $0.024 | 60 min/month | Via Lambda | Included | Word-level |
ScreenApp includes speaker diarization, timestamped highlights, and batch processing in the base price. Other providers charge extra for these features or require combining multiple services.
FAQ
What video formats does the API accept?
MP4, MOV, AVI, WMV, WEBM, and direct YouTube/Vimeo URLs. Files up to 2GB are processed in the free tier, 10GB in the Pro tier.
How accurate is the speaker diarization?
90-95% accuracy for videos with clear audio and 2-4 speakers. Accuracy decreases with background noise or more than 6 speakers.
Can I customize the summary length and format?
Yes. Set summary_length to “short” (2-3 sentences), “medium” (1 paragraph), or “detailed” (bullet points with timestamps). You can also provide custom prompt instructions.
Is the API safe for confidential video content?
All videos are processed with end-to-end encryption. Enterprise plans include on-premise Docker deployment and VPC-private endpoints. Videos are deleted from our servers within 24 hours unless you enable archive mode.
What happens if transcription quality is poor?
The API returns confidence scores per segment. Segments below 70% confidence are flagged. You can enable “manual review mode” which holds low-confidence summaries for human verification before returning results.
How fast is the processing time?
Real-time processing for videos under 10 minutes. Longer videos process at approximately 30 seconds per minute of video. Batch requests run in parallel across multiple workers.
Does the API work with live streams?
Yes. Enable streaming mode to receive partial summaries every 5 minutes as the video plays. Useful for webinar monitoring and live event coverage.
Can I integrate this with ChatGPT or Claude?
Yes. The API returns structured summaries that fit within LLM context windows. You can send the summary to ChatGPT/Claude for follow-up questions while avoiding the token cost of raw transcripts.
What languages are supported?
40+ languages with automatic detection. English, Spanish, French, German, Portuguese, Italian, Japanese, Korean, Chinese, and Russian have the highest transcription accuracy.
Where can I find API documentation and SDKs?
Visit screenapp.io/developers for REST API docs, Python and Node.js SDKs, code examples, and interactive API playground.