Skip to content

Nan-Do/phi-code

Repository files navigation

πŸ’‘ $\varphi$-Code: A Python Agentic Competitive Programmer Fueled by (tiny) LLMs

$\varphi$-Code (phi-code) is an open-source, agent-based system designed to tackle competitive programming problems. Inspired by projects like AlphaCode, $\varphi$-Code's core philosophy is accessibility: it aims to achieve strong performance using small-to-medium (tiny) Language Models (LLMs), making powerful coding agents runnable even on a consumer-level laptop or desktop PC.

πŸ“° News

The $\varphi$-Code agentic system achieved a significant milestone by successfully answering all questions for the latest LeetCode's Weekly Contest 476 (as of the publication date on 11-17-2025) correctly on the first attempt!

This remarkable feat highlights the effectiveness of $\varphi$-Code's full pipeline, where the Remote Solution Generation produces candidates, the Intelligent Ranker Agent selects the most promising one, and Automated Testing ensures the final submission is robust.

Using the underlying gpt-oss-20B at Q4 model, $\varphi$-Code's successful execution achieved a final contest score of 18, tying the Python score obtained by much larger models like Gemini 2.5 Pro, GPT-5, DeepSeek v3.2, Qwen 3, and Grok 4 in the competition.

You can check the outputs generated by $\varphi$-Code here and view the official LeetCode contest rankings for the LLMs here.

Web Interface

WebBased Screenshot

Curses Interface

Curses Screenshot

Terminal Mode

Terminal Screenshot

πŸ’» Designed for Consumer Hardware

$\varphi$-Code is built around the principle of resource efficiency.

  • Tiny LLM Focus: The system leverages compact models like Gemma 3-4B for solution generation, which are manageable on standard consumer GPUs or even modern CPUs via quantization.
  • LLaMA Server Integration: By using the llama.cpp LLaMA server, $\varphi$-Code can efficiently offload the computationally intensive LLM inference to the best available local hardware with optimized performance.
  • Efficient Ranker: The ranking component, built on the sentence-transformers library, uses highly efficient embedding models that require minimal resources compared to the generative LLMs.

✨ Features

  • Accessible & Open-Source: Built with a focus on running powerful agents using less computational resources.
  • Web-Based Interface: A user-friendly Gradio web application for submitting problem statements and viewing generated solutions.
  • Curses-Based Interface: A user-friendly curses interface for generating and viewing solutions from the terminal with vim-mode support.
  • Terminal Mode: Run the tool as a shell command.
  • Remote Solution Generation: Connects to a remote LLM API (like a llama.cpp server) to generate multiple candidate Python solutions.
  • Intelligent Ranking (Ranker Agent): Utilizes an embedding model from the sentence-transformers ecosystem to evaluate the feasibility of generated solutions (samples) against the problem statement (anchor).
  • Automated Testing: Parses example tests from the problem statement, runs the candidate solutions, and sorts them by tests passed and the ranker's confidence score.

πŸ“‚ Project Structure

.
β”œβ”€β”€ contests_results/                     # Results of phi-code for different contests.
β”œβ”€β”€ datasets/                             # Competitive programming datasets for ranker training.
β”‚   β”œβ”€β”€ atcoder.jsonl
β”‚   β”œβ”€β”€ codechef.jsonl
β”‚   └── ...
β”œβ”€β”€ LICENSE
β”œβ”€β”€ ranker/                               # Code for training and managing the sentence-transformers ranker.
β”‚   β”œβ”€β”€ check_datasets.py
β”‚   β”œβ”€β”€ filter_datasets.py
β”‚   └── train.py                          # Main ranker training script.
└── solver/                               # Solver coding agent.
    β”œβ”€β”€ general_prompt.txt
    β”œβ”€β”€ leetcode_prompt.txt
    β”œβ”€β”€ leetcode.py                       # LeetCode module.
    β”œβ”€β”€ utils.py
    β”œβ”€β”€ main.py                           # The main module.
    β”œβ”€β”€ web_ui.py                         # Web Interface.
    β”œβ”€β”€ curses_ui.py                      # Curses Interface.
    β”œβ”€β”€ terminal.py                       # Run the tool as a shell command.
    └── requirements.txt

πŸš€ Getting Started

This project is structured into two main components: the solver for running the coding agent and the ranker for training the model that sorts the solutions.

1. Running the Web UI

The core agent functionality is available through the solver/main.py Gradio application.

Prerequisites

  • A Python environment.
  • A running LLaMA server (e.g., using llama.cpp's llama-server tool) hosting a non-reasoning LLM.
    • Recommended Model: Gemma 3-4B or similar code-centric, compact model.
    • Recommended Settings: Temperature of 0.95 and Top-K of 300.

Installation and Execution

  1. Navigate to the solver directory.

  2. Install dependencies:

    pip install -r requirements.txt
  3. Run the main application, providing the necessary server details:

    python solver/main.py \
      --server <YOUR_SERVER_ADDRESS> \
      --port <YOUR_SERVER_PORT> \
      --site leetcode

    Note: The current version is primarily focused on LeetCode problems, utilizing leetcode_prompt.txt. This is expected to improve in future updates. Problem statements from other sites won't work.

main.py Command Line Options

Option Description Example
-r, --ranker Path or HuggingFace link to the ranker model (a sentence-transformers model). Salesforce/SFR-Embedding-Code-2B_R
-s, --server Address of the llama.cpp server hosting the LLM. http://127.0.0.1
-p, --port Port of the llama.cpp server. 8080
-m, --site From which site are the problem statements leetcode
-i, --interface Which interface to use (terminal, web, curses) web
-f, --statement Text file with the problem statement to use statement.txt
-n, --number Number of solutions to generate 10
-o, --output_file File to store the generated solutions in jsonl format solutions.jsonl

🧠 Training the Ranker Agent with sentence-transformers

The ranker is a crucial component that scores candidate solutions. It is trained as an embedding model to determine how relevant a generated solution is to a given problem statement.

⚠️ Work in Progress: Training a state-of-the-art ranker still requires significant resources. The current solver uses a pre-trained model, but the tools below are provided for those who wish, and have the means, to train their own.

The ranker Folder

The ranker folder contains the code for fine-tuning the ranker model using the sentence-transformers library, typically leveraging a Siamese-network architecture for contrastive learning (e.g., Multiple Negative Ranking Loss or Triplet Loss).

  • train.py: The main script for fine-tuning a ranker model.
  • check_datasets.py, filter_datasets.py, sample_dataset.py: Utilities for preparing and managing the training data in datasets/.

Training the Ranker

The train.py script allows you to fine-tune an embedding model on competitive programming datasets.

python ranker/train.py \
  --model coldchair16/CPRetriever-Code \
  --epochs 2 \
  --batch-size 8 \
  --leetcode datasets/leetcode.jsonl \
  --codeforces datasets/codeforces.jsonl \
  --output-dir my_trained_ranker

🀝 Contributing

$\varphi$-Code is an open-source effort. We welcome contributions to:

  • Expand the datasets for ranker training.
  • Improve the prompt templates (e.g., creating one for Codeforces).
  • Enhance the problem parsing to extract tests more reliably.

Feel free to open an issue or submit a pull request!

About

πœ‘ -code: A Python Agentic Competitive Programmer Fueled by (tiny) LLMs.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages