Three-phase AI extraction API for VC pitch decks with Claude Sonnet 4.5
Two Bear Capital API is a multi-service API with async job-based PDF processing using ARQ workers. Built with FastAPI, PostgreSQL, Redis, and designed for Railway deployment.
The V3 API implements a CRUD-style three-phase workflow: upload PDF (sync), convert to images (async), and extract data (async). This architecture enables granular failure tracking, efficient retries of only failed phases, and resource reuse for multiple extractions per PDF.
AI extraction uses Claude Sonnet 4.5 to analyze pitch decks and extract comprehensive data including company info, team, traction, market analysis, and fundraising details. Includes life sciences support for indication, modality, and development stage. Features evidence-based citations with page numbers and verbatim quotes, anti-hallucination safeguards (returns null for missing data), and per-section confidence scoring.
Offers both real-time extractions and batch processing with 50% cost savings. Includes webhook support for job completion notifications.
Tech: Python, FastAPI, Redis, PostgreSQL, Claude AI, ARQ, Railway, Supabase
Features
- Three-phase extraction workflow: Upload → Convert → Extract
- AI-powered pitch deck analysis with Claude Sonnet 4.5
- Evidence-based citations with page numbers and confidence scoring
- Batch extractions with 50% cost savings (up to 24h latency)
- Life sciences support: indication, modality, development stage
- Webhook notifications for job completion
- Multi-layer PDF validation with hyperlink sanitization
More from the portfolio
This project is part of tvsguide.io, the personal portfolio of Tim Veil — software engineer and CIO at Two Bear Capital, previously at StarTree, Cockroach Labs, and Hortonworks. The full collection covers distributed systems, data infrastructure, JDBC drivers, AI services, build pipelines, real-time analytics, and a couple of personal apps. Each project ships with source code, tech notes, and links to live deployments where applicable.
Browse the project index, read about Tim's background, follow the blog, or return to the homepage. Source code lives on GitHub; professional history is on LinkedIn.