A system for scraping, processing, and serving Tulsa Government meeting videos and documents.
This application is structured as a set of microservices, each with its own responsibility:
- Scrapes Tulsa Government meeting information
- Stores committee and meeting data
- Extracts video URLs from viewer pages
- Downloads and processes videos
- Extracts audio from videos
- Manages batch processing of videos
- Handles document storage and retrieval
- Links documents to meeting records
- Converts audio files to text using the OpenAI Whisper API
- Stores and retrieves transcriptions with time-aligned segments
- Manages transcription jobs
For more details, see the architecture documentation.
- Node.js LTS and npm
- Encore CLI
- ffmpeg (for video processing)
- OpenAI API key (for transcription)
- Clone the repository:
git clone <repository-url>
cd tulsa-transcribe- Install dependencies:
npm install- Run the setup script to configure your environment:
npx ts-node setup.ts- Update the .envfile with your database credentials and API keys:
TGOV_DATABASE_URL="postgresql://username:password@localhost:5432/tgov?sslmode=disable"
MEDIA_DATABASE_URL="postgresql://username:password@localhost:5432/media?sslmode=disable"
DOCUMENTS_DATABASE_URL="postgresql://username:password@localhost:5432/documents?sslmode=disable"
TRANSCRIPTION_DATABASE_URL="postgresql://username:password@localhost:5432/transcription?sslmode=disable"
OPENAI_API_KEY="your-openai-api-key"
- Run the application using Encore CLI:
encore run| Endpoint | Method | Description | 
|---|---|---|
| /scrape/tgov | GET | Trigger a scrape of the TGov website | 
| /tgov/meetings | GET | List meetings with filtering options | 
| /tgov/committees | GET | List all committees | 
| /tgov/extract-video-url | POST | Extract a video URL from a viewer page | 
| Endpoint | Method | Description | 
|---|---|---|
| /api/videos/download | POST | Download videos from URLs | 
| /api/media/:blobId/info | GET | Get information about a media file | 
| /api/videos | GET | List all stored videos | 
| /api/audio | GET | List all stored audio files | 
| /api/videos/batch/queue | POST | Queue a batch of videos for processing | 
| /api/videos/batch/:batchId | GET | Get the status of a batch | 
| /api/videos/batch/process | POST | Process the next batch of videos | 
| Endpoint | Method | Description | 
|---|---|---|
| /api/documents/download | POST | Download and store a document | 
| /api/documents | GET | List documents with filtering options | 
| /api/documents/:id | GET | Get a specific document | 
| /api/documents/:id | PATCH | Update document metadata | 
| /api/meeting-documents | POST | Download and link meeting agenda documents | 
| Endpoint | Method | Description | 
|---|---|---|
| /transcribe | POST | Request transcription for an audio file | 
| /jobs/:jobId | GET | Get the status of a transcription job | 
| /transcriptions/:transcriptionId | GET | Get a transcription by ID | 
| /meetings/:meetingId/transcriptions | GET | Get all transcriptions for a meeting | 
- daily-tgov-scrape: Daily scrape of the TGov website (12:01 AM)
- process-video-batches: Process video batches every 5 minutes
Each service has its own database migration files in its data/migrations directory. These are applied automatically when running the application.
- Determine which service the feature belongs to
- Add the necessary endpoint(s) to the appropriate service
- Update any cross-service dependencies as needed
- Test the feature locally
encore testThe application is deployed using Encore. Refer to the Encore deployment documentation for details.