Composing Linear Layers from Irreducibles

This repository contains the implementation for the NeurIPS 2025 paper "Composing Linear Layers from Irreducibles" by Travis Pence, Daisuke Yamada, and Vikas Singh.

Paper: arXiv:2507.11688

Overview

This work demonstrates how linear layers in large language models can be decomposed into geometric primitives (bivectors) using Clifford algebra, achieving exponential parameter reduction from O(d²) to O(log²d) while maintaining competitive performance. We replace key, query, and value projections in LLM attention layers with rotor-based transformations that compose simple geometric rotations. The bivector to rotor process via invariant decomposition is visualized below, which we go into much more detail in the paper and the differential algorithm we present.

Setup

Clone thhis repo, the submission branch of torch_ga_fix (https://github.com/TravisNP/torch_ga_fix/tree/submission#), and fast-hadamard-transform (https://github.com/Dao-AILab/fast-hadamard-transform.git).

git clone git@github.com:vsingh-group/ComposingLinearLayers.git
git clone git@github.com:TravisNP/torch_ga_fix.git
git clone git@github.com:Dao-AILab/fast-hadamard-transform.git
cd torch_ga_fix
git checkout submission
cd ../

Pull the pytorch docker image and create/start/attach to a container

docker pull pytorch/pytorch:2.5.1-cuda12.1-cudnn9-devel
docker run -it \
  --name ComposingLinearLayers \
  --gpus all \
  -v "$(pwd):/workspace" \
  pytorch/pytorch:2.5.1-cuda12.1-cudnn9-devel \
  bash

Install torch_ga_fix and fast-hadamard-transform

cd torch_ga_fix/
pip install .
cd ../fast-hadamard-transform/
pip install .

Install requirements.txt

cd ../ComposingLinearLayers/
pip install -r requirements.txt

Set HUGGINGTOKEN environment variable with

export HUGGINGTOKEN=<yourtokenhere>

To stop the container, type exit. To start/attach to the container later, use docker start -ai ComposingLinearLayers.

Running Experiments

Main Experiments

To replace attention layers in different LLMs, navigate to the ComposingLinearLayers directory and run the corresponding script:

./run/run_<model_name>.sh

Available models:

run_llama.sh - LLaMA-3.2 1B / LlaMa-3.2 3B
run_qwen.sh - Qwen-2.5 1.5B
run_fox.sh - Fox-1.0 1.6B

Below are the average PPL for replacing up to three transformer layers of LLaMa and Qwen

Projection Convergence Experiment (Appendix D)

To reproduce the projection convergence analysis:

python -m run.test_projection_convergence

Below are the results showing that while larger rotors initially require more iterations to converge, they eventually converge just as quickly as smaller models.

Command-Line Arguments

The main script main.py accepts the following arguments:

Required Arguments

--layers: Comma-separated layer indices to replace (e.g., "12,13,14")
--root: Root directory for storing data and model outputs
--config: Path to YAML configuration file (without .yaml extension)
--dataset: Dataset for evaluation
- Options: arc_challenge, hellaswag, wikitext, c4, ptb
--model: LLM model to use
- Options: llama1B, llama3B, Qwen2.5-1.5B, fox
--replacement_type: Type of layer replacement
- Options: rotor, lowrank_linear, bh_linear

Optional Arguments

--train_projo: If set, trains the output projection layer (o_proj) after replacing attention layers
--eval_datatype: Data type for evaluation
- Options: float32 (default), bfloat16
- Note: Not sure if bfloat16 works
--rank: Rank for low-rank linear approximation (required when --replacement_type=lowrank_linear)
--llm_batch_size: Number of prompts to process simultaneously during data extraction
--remove: If set, deletes extracted training data after processing to save disk space

Example Usage

See the scripts in the run directory. We have scripts that recreate the experiments for the Fox, LLaMa, and Qwen models

Configuration Files

Configuration files (.yaml) contain hyperparameters specific to each replacement type. See the paper (Appendix C) for details on hyperparameter settings used in experiments.

Workflow

The pipeline consists of three main steps for each layer:

Data Extraction: Extract hidden states (inputs) and projection outputs (targets) from the original model using the specified dataset
Training: Train the replacement layer (rotor, low-rank, or block-Hadamard) to minimize MSE between predicted and true projection outputs
- Optionally retrain the output projection (o_proj) if --train_projo is set
Evaluation: Evaluate on perplexity (for language modeling datasets) or accuracy (for multiple-choice benchmarks)

When replacing multiple layers, they are processed sequentially - each new layer is trained with all previously replaced layers active.

Citation

If you use this code, please cite:

@inproceedings{pence2025composing,
  title={Composing Linear Layers from Irreducibles},
  author={Pence, Travis and Yamada, Daisuke and Singh, Vikas},
  booktitle={Advances in Neural Information Processing Systems},
  year={2025}
}

Contact

For questions or issues, please open a GitHub issue or contact Travis Pence at tnpence at wisc dot edu.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
__pycache__		__pycache__
config_files		config_files
custom_layers		custom_layers
images		images
run		run
README.md		README.md
baselines.py		baselines.py
data.py		data.py
evaluate.py		evaluate.py
get_results.py		get_results.py
main.py		main.py
proj_o.py		proj_o.py
random_tuple.py		random_tuple.py
requirements.in		requirements.in
requirements.txt		requirements.txt
rotor_layer.py		rotor_layer.py
rotor_network.py		rotor_network.py
testing.py		testing.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Composing Linear Layers from Irreducibles

Overview

Setup

Running Experiments

Main Experiments

Projection Convergence Experiment (Appendix D)

Command-Line Arguments

Required Arguments

Optional Arguments

Example Usage

Configuration Files

Workflow

Citation

Contact

About

Uh oh!

Releases

Packages

Languages

vsingh-group/ComposingLinearLayers

Folders and files

Latest commit

History

Repository files navigation

Composing Linear Layers from Irreducibles

Overview

Setup

Running Experiments

Main Experiments

Projection Convergence Experiment (Appendix D)

Command-Line Arguments

Required Arguments

Optional Arguments

Example Usage

Configuration Files

Workflow

Citation

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages