Generate enriched search phrases and natural questions for each hit in the dataset JSON files.
- Install dependencies:
npm install- Set OpenAI key (required):
export OPENAI_API_KEY=sk-...- Run generation:
npm run generate:2025Outputs queries-2025.json with a searchQueries object per hit. An OpenAI API key is required; the script will exit if it's missing.
You can control batching, concurrency, and incremental saving:
tsx scripts/generateQueries.ts 2025.json queries-2025.json --limit=100 --batchSize=10 --concurrency=5 --saveEvery=2Flags:
--limit=NProcess only first N hits.--batchSize=KNumber of hits grouped per batch (default 5).--concurrency=CParallel requests inside a batch (default = batchSize).--saveEvery=BPersist to disk every B batches (default 1).--resumeResume from an existing output file (matched byid).
Progress is streamed in-place with elapsed time, ETA, and error count. A .tmp file is atomically swapped for durability on each save.
After generation you can flatten all generated phrases/questions to a CSV (one row per anchor with associated positive content snippet):
npm run export:2025
# or custom
tsx scripts/exportAnchors.ts queries-2025.json anchors-2025.csv --maxContentChars=3000CSV Columns:
- hit_id: original section id
- rule: section name/number
- anchor: generated phrase or question
- content: associated markdown snippet (truncated)