This script helps to automate the process of preparing data for finetuning on OpenAI models, specifically GPT-3.5 and Babbage. It also provides utilities to validate the data, transform the data to the required JSONL format, and estimate the cost of the finetuning process.
- Validate API Key
- Validate and select the appropriate model (GPT-3.5 or Babbage)
- Check input data file (JSONL)
- Estimate finetuning cost
- Create and manage finetuning jobs on OpenAI
- Python 3
- External libraries:
pyfiglet,openai,tiktoken,dotenv,argparse,json,re,os,sys,time,clint
To install the required libraries:
pip install pyfiglet openai tiktoken python-dotenv argparse clintor
pip install requirements.txtpython ftup.py [-k <API_KEY>] -m <MODEL_NAME> -f <INPUT_FILE> [-s <SUFFIX>] [-e <EPOCHS>]
Arguments:
-k, --key: Optional. API key. Optional argument, but required in default env to have an API key in enviroment. OPENAI_API_KEY-m, --model: Required. Model to use. Options:gptforgpt-3.5-turbo-0613orbabforbabbage-002.-f, --file: Required. Input data file (JSONL format).-s, --suffix: Optional. Add a suffix for your finetuned model. E.g., 'my-suffix-title-v-1'.-e, --epoch: Optional. Number of epochs for training. Default is 3.
Store your API key in a .env file in the format:
OPENAI_API_KEY=your_api_key_here
The script will load by default this key if not -k / --key passed as an argument.
check_key(key): Validates format for OpenAI API key.check_model(model): Validates the model name.check_jsonl_file(file): Checks if the provided file has a valid JSONL name and if it exists.create_update_jsonl_file(model, file): Check if JSONL have a correct format and uploads file to OpenAI.update_ft_job(file_id_name, model, suffix, epoch): Creates or updates the finetuning job on OpenAI.check_jsonl_gpt35(file): Validates the format for GPT-3.5 training.check_jsonl_babbage(file): Validates the format for Babbage-002 training.cost_gpt(file, epochs): Estimates the cost of the finetuning process.
- Ensure your data adheres to OpenAI's data format guidelines for finetuning.
- Monitor your OpenAI dashboard to keep track of your usage and costs.
- OpenAI Documentation
- OpenAI Cookbook - FineTuning
- Python Argparse Library
- pyfiglet Documentation
- tiktoken Library
- Cancel training pressing key
- Adding Token and cost for Babbage model
- Automate for creating train and validation files 80-20%
$ python ftup.py --key your_api_key_here --file train_gpt3_5.jsonl --model gpt --epoch 1 --suffix custom-model-name
or
$ python ftup.py -f train_gpt3_5.jsonl -m gpt -e 1 -s custom-model-name
____________ __ ______
/ ____/_ __/ / / / / __ \
/ /_ / / ______ / / / / /_/ /
/ __/ / / /_____/ / /_/ / ____/
/_/ /_/ \____/_/
Checking API key ...
- API Key
Checking model ...
- Model gpt
Checking if jsonl is valid ...
- JSON File train_gpt3_5.jsonl
Checking if jsonl format is valid for GPT-3.5 training ...
- Num examples: 225
- JSONL train_gpt3_5.jsonl correct format
Uploading jsonl train file ...
- File ID: file-abcd123
Dataset has ~15153 tokens that will be charged for during training
You'll train for 1 epochs on this dataset
By default, you'll be charged for ~15153 tokens
Total cost: $0.1212 💰
Creating a finetuning job ...
- Fintetuning job id: ftjob-abc123
Status: succeeded
Finetuning succeeded! ☑️
Finetune model: ft:gpt-3.5-turbo:openai:custom-model-name:7p4lURe