Command-line utilities for querying large language models
This repo is built around making it easy to run a set of queries via CLI on a large language model (LM) and get back a set of completions formatted nicely into a single document. It also has a basic Python API.
Typical workflow:
- Create
CSV/.xlsx/etc. file with model queries as rows - Run
lm-apiwith-i /path/to/my/queries.csv, and use-kcto specify the column name with the queries - Get completions compiled into a single markdown file!
Queries are expected to be in a pandas-compatible format, and results are written to a text file with markdown formatting for easy viewing/sharing.
An example output file is provided in data/lm-api-output.
Directly install via pip+git:
# create a virtual environment (optional): pyenv virtualenv 3.8.5 lm-api
pip install git+https://github.com/pszemraj/lm-api.gitAlternatively, after cloning, cd into the lm-api directory and run:
git clone https://github.com/pszemraj/lm-api.git
cd lm-api
# create a virtual environment (optional): pyenv virtualenv 3.8.5 lm-api
pip install -e .A quick test can be run with the src/lm_api/test_goose_api.py script.
You will need an API key for each provider you want to query. Currently, the following providers are supported:
API keys can be set in the environment variables GOOSE and OPENAI:
export OPENAI=api_key11111114234234etc
# or
export GOOSE=api_key11111114234234etcAlternatively, pass as an argument when calling lm-api with the -k switch.
Command line scripts are located in src/lm_api/ and become installed as CLI commands that can be run from anywhere. Currently, the commands are limited to lm-api (more to come).
lm-api with the -k flag to run any queries
lm-api -i data/test_queries.xlsx -o ./my-test-folderThis will run the queries in data/test_queries.xlsx and write the results to a .md file in my-test-folder/ in your current working directory.
There are many options for the script, which can be viewed with the -h flag (e.g., lm-api -h).
usage: lm-api [-h] [-i INPUT_FILE] [-o OUTPUT_DIR] [-provider PROVIDER_ID] [-k KEY] [-p PREFIX] [-s SUFFIX] [-simple]
[-kc KEY_COLUMN] [-m MODEL_ID] [-n N_TOKENS] [-t TEMPERATURE] [-f2 FREQUENCY_PENALTY]
[-p2 PRESENCE_PENALTY] [-v]The input file should be in a pandas-compatible format (e.g., .csv, .xlsx, etc.). The default column name for the queries is query, which can be changed with the -kc flag.
An example input file is provided in data/test_queries.xlsx.
Note: this is a work in progress, and the following is a running list of things that need to be done. This may and likely will be updated.
- adjust the
--prefixand--suffixflags to a "prompt engine" switch that can augment/update the prompt with a variety of options (e.g.,--prompt-engine=prefixor--prompt-engine=prefix+suffix) - add a simple CLI command that does not require a query file
- add support for other providers (e.g., textsynth)
- validate performance as package / adjust as needed (i.e., import
lm_apishould work and have full functionality w.r.t. CLI) - setup tests
We are compiling/discussing a list of potential features in the discussions section, so please feel free to add your thoughts there!