Socratic: Automated Knowledge Synthesis for Vertical LLM Agents

Transform unstructured domain data into structured, agent-ready knowledge - automatically.

Overview

Socratic is a tool that automates knowledge synthesis for vertical LLM agents - agents specialized in specific domains.

Socratic ingests sparse, unstructured source documents (docs, code, logs, etc.) and synthesizes them into compact, structured knowledge bases ready to plug into agents.

Why Socratic?

Building effective domain agents requires high-quality, domain-specific knowledge. Today, this knowledge is:

Manually curated by experts 🧠
Costly to maintain 💸
Quickly outdated as source documents change ⚠️

The goal of Socratic is to automate this process, enabling accurate and cost effective domain knowledge management.

Demo

Using Socratic to build knowledge base for Google Analytics SQL agent: https://youtu.be/L20vOB3whMs

Install

From Pypi:

pip install socratic-cli

From Source:

git clone https://github.com/kevins981/Socratic.git
cd socratic

# optional
conda create -n socratic python=3.10 -y
conda activate socratic

# install
pip install -e .

Install OpenAI Codex:

# might need sudo
npm install -g @openai/codex

For Web-UI:

# if npm is not installed
sudo apt install nodejs npm
cd web
npm install

Running

Assume that the project name is airline_demo and relevant source files are located in examples/repos/tau_airline.

Web UI (recommended):

# 0. Create project
socratic-cli create --name airline_demo --input_dir examples/repos/tau_airline 

# 1. Launch web UI
cd web
npm run dev:project -- --project airline_demo

Command line interface:

# 0. Create project
socratic-cli create --name airline_demo --input_dir examples/repos/tau_airline 

# 1. Synthesis
# Source documents are stored in examples/repos/tau_airline
socratic-cli synth --model gpt-5.1 --project airline_demo

# Add a concept
socratic-cli synth --model gpt-5.1 --project airline_demo --add_concept

# Modify a concept
socratic-cli synth --model gpt-5.1 --project airline_demo --modify_concept --concept_id 1

# Delete a concept
socratic-cli synth --project airline_demo --delete_concept 1

# 2. Compose agent knowledge prompt
socratic-cli compose --project airline_demo --model gpt-5.1

OpenAI API

Currently only OpenAI models are supported, and an OpenAI API key is required.

export OPENAI_API_KEY="your_api_key_here"

How It Works

Socratic uses a combination of LLM and LLM agents. Socratic contains 3 stages: ingest, synthesis, and compose.

1. Ingest

Given a directory containing documents relevant to the vertical task, Socratic extracts a list of candidate concepts to research. This is done collaboratively between the user and a terminal agent.

User provides high-level research directions.
A terminal agent (codex) quickly scans the source documents to gain context and proposes concepts to research.
User further refines and finalizes the list of concepts.

The ingest stage generates the final set of concepts to research (concepts.txt).

2. Synthesis

For each concept to research generated in the ingest stage, Socratic launches a terminal agent (codex) that explores the source documents to synthesize knowledge related to the specific concept.

For each concept, the synthesis stores the synthesized knowledge in both plain text (concept{i}-synth.txt) and JSON format (concept{i}-synth.json).

3. Compose

Convert synthesized knowledge into prompts that are ready to be dropped directly into your LLM agent’s context.

Privacy & Security

Local storage: All files and outputs are stored entirely on your own machine. Socratic does not upload, transfer, index, or store your data anywhere else.
Local processing: All analysis and processing happen locally, except when data is sent to an external LLM provider (e.g., OpenAI) using your own API key.
Sandboxed terminal agent: Socratic uses Codex as its terminal agent to read and analyze source documents. Socratic runs Codex in read-only mode, preventing the agent from editing files or running commands that require network access. See the Codex sandbox documentations for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
examples/repos		examples/repos
socratic		socratic
web		web
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package.json		package.json
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Socratic: Automated Knowledge Synthesis for Vertical LLM Agents

Overview

Why Socratic?

Demo

Install

Running

OpenAI API

How It Works

1. Ingest

2. Synthesis

3. Compose

Privacy & Security

About

Uh oh!

Releases

Packages

Languages

License

kevins981/Socratic

Folders and files

Latest commit

History

Repository files navigation

Socratic: Automated Knowledge Synthesis for Vertical LLM Agents

Overview

Why Socratic?

Demo

Install

Running

OpenAI API

How It Works

1. Ingest

2. Synthesis

3. Compose

Privacy & Security

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages