Audio Description Project

A CLI based audio description generation tool built on Python leveraging the power of Ollama, Whisper, CLIP, Coqui TTS and FFMPEG.

Installation

Installation is written for MacOS

Install pyenv for python 3.11

Add these commands to your ~/.zshrc or ~/bashrc

    export PYENV_ROOT="$HOME/.pyenv"
    [[ -d $PYENV_ROOT/bin ]] && export PATH="$PYENV_ROOT/bin:$PATH"
    eval "$(pyenv init - zsh)"

Install FFMPEG and Ollama and espeak
- Run brew install ffmpeg for video editing
- Run brew install espeak for some AI Text-To-Speech
- Install Ollama from their site https://ollama.com/download
Run ollama pull gemma3:12b and ollama pull nomic-embed-text
Run python3 -m venv ./venv && source ./venv/bin/activate
- Every time you start a new terminal for the project run source ./venv/bin/activate
- Python Environments will run that for you
Run pip3 install -r requirements.txt
Run python3 ./describe_video.py --input ./my_video.mp4 --output ./my_video_script.txt to generate a video script file
Then run python3 ./process_video.py --input_video ./my_video.mp4 --input_text ./my_video_script.txt --output ./my_video_audio_description.mp4

To run pylint, runpylint ./describe_video.py. Pylint configuration is located in ./pyproject.toml

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
tmp		tmp
.gitignore		.gitignore
.python-version		.python-version
audio_block_detect.py		audio_block_detect.py
describe_scene.py		describe_scene.py
describe_video.py		describe_video.py
ffmpeg_helper.py		ffmpeg_helper.py
process_video.py		process_video.py
pyproject.toml		pyproject.toml
readme.md		readme.md
requirements.txt		requirements.txt
text_to_speech.py		text_to_speech.py
visual_scene_detect.py		visual_scene_detect.py
visual_scene_detect_clip.py		visual_scene_detect_clip.py