$\varphi$ -Code (phi-code) is an open-source, agent-based system designed to tackle competitive programming problems. Inspired by projects like AlphaCode,$\varphi$ -Code's core philosophy is accessibility: it aims to achieve strong performance using small-to-medium (tiny) Language Models (LLMs), making powerful coding agents runnable even on a consumer-level laptop or desktop PC.
The
This remarkable feat highlights the effectiveness of
Using the underlying gpt-oss-20B at Q4 model,
You can check the outputs generated by
- Tiny LLM Focus: The system leverages compact models like Gemma 3-4B for solution generation, which are manageable on standard consumer GPUs or even modern CPUs via quantization.
-
LLaMA Server Integration: By using the
llama.cppLLaMA server,$\varphi$ -Code can efficiently offload the computationally intensive LLM inference to the best available local hardware with optimized performance. -
Efficient Ranker: The ranking component, built on the
sentence-transformerslibrary, uses highly efficient embedding models that require minimal resources compared to the generative LLMs.
- Accessible & Open-Source: Built with a focus on running powerful agents using less computational resources.
- Web-Based Interface: A user-friendly Gradio web application for submitting problem statements and viewing generated solutions.
- Curses-Based Interface: A user-friendly curses interface for generating and viewing solutions from the terminal with vim-mode support.
- Terminal Mode: Run the tool as a shell command.
- Remote Solution Generation: Connects to a remote LLM API (like a
llama.cppserver) to generate multiple candidate Python solutions. - Intelligent Ranking (Ranker Agent): Utilizes an embedding model from the
sentence-transformersecosystem to evaluate the feasibility of generated solutions (samples) against the problem statement (anchor). - Automated Testing: Parses example tests from the problem statement, runs the candidate solutions, and sorts them by tests passed and the ranker's confidence score.
.
βββ contests_results/ # Results of phi-code for different contests.
βββ datasets/ # Competitive programming datasets for ranker training.
β βββ atcoder.jsonl
β βββ codechef.jsonl
β βββ ...
βββ LICENSE
βββ ranker/ # Code for training and managing the sentence-transformers ranker.
β βββ check_datasets.py
β βββ filter_datasets.py
β βββ train.py # Main ranker training script.
βββ solver/ # Solver coding agent.
βββ general_prompt.txt
βββ leetcode_prompt.txt
βββ leetcode.py # LeetCode module.
βββ utils.py
βββ main.py # The main module.
βββ web_ui.py # Web Interface.
βββ curses_ui.py # Curses Interface.
βββ terminal.py # Run the tool as a shell command.
βββ requirements.txt
This project is structured into two main components: the solver for running the coding agent and the ranker for training the model that sorts the solutions.
The core agent functionality is available through the solver/main.py Gradio application.
- A Python environment.
- A running LLaMA server (e.g., using
llama.cpp'sllama-servertool) hosting a non-reasoning LLM.- Recommended Model: Gemma 3-4B or similar code-centric, compact model.
- Recommended Settings: Temperature of 0.95 and Top-K of 300.
-
Navigate to the
solverdirectory. -
Install dependencies:
pip install -r requirements.txt
-
Run the main application, providing the necessary server details:
python solver/main.py \ --server <YOUR_SERVER_ADDRESS> \ --port <YOUR_SERVER_PORT> \ --site leetcode
Note: The current version is primarily focused on LeetCode problems, utilizing
leetcode_prompt.txt. This is expected to improve in future updates. Problem statements from other sites won't work.
| Option | Description | Example |
|---|---|---|
-r, --ranker |
Path or HuggingFace link to the ranker model (a sentence-transformers model). |
Salesforce/SFR-Embedding-Code-2B_R |
-s, --server |
Address of the llama.cpp server hosting the LLM. |
http://127.0.0.1 |
-p, --port |
Port of the llama.cpp server. |
8080 |
-m, --site |
From which site are the problem statements | leetcode |
-i, --interface |
Which interface to use (terminal, web, curses) | web |
-f, --statement |
Text file with the problem statement to use | statement.txt |
-n, --number |
Number of solutions to generate | 10 |
-o, --output_file |
File to store the generated solutions in jsonl format | solutions.jsonl |
The ranker is a crucial component that scores candidate solutions. It is trained as an embedding model to determine how relevant a generated solution is to a given problem statement.
β οΈ Work in Progress: Training a state-of-the-art ranker still requires significant resources. The currentsolveruses a pre-trained model, but the tools below are provided for those who wish, and have the means, to train their own.
The ranker folder contains the code for fine-tuning the ranker model using the sentence-transformers library, typically leveraging a Siamese-network architecture for contrastive learning (e.g., Multiple Negative Ranking Loss or Triplet Loss).
train.py: The main script for fine-tuning a ranker model.check_datasets.py,filter_datasets.py,sample_dataset.py: Utilities for preparing and managing the training data indatasets/.
The train.py script allows you to fine-tune an embedding model on competitive programming datasets.
python ranker/train.py \
--model coldchair16/CPRetriever-Code \
--epochs 2 \
--batch-size 8 \
--leetcode datasets/leetcode.jsonl \
--codeforces datasets/codeforces.jsonl \
--output-dir my_trained_ranker- Expand the datasets for ranker training.
- Improve the prompt templates (e.g., creating one for Codeforces).
- Enhance the problem parsing to extract tests more reliably.
Feel free to open an issue or submit a pull request!


