Skip to content

unl/Audio-Description-Generator

Repository files navigation

Audio Description Project

A CLI based audio description generation tool built on Python leveraging the power of Ollama, Whisper, CLIP, Coqui TTS and FFMPEG.

Installation

Installation is written for MacOS

  1. Install pyenv for python 3.11
    1. Run brew install pyenv

    2. Add these commands to your ~/.zshrc or ~/bashrc

          export PYENV_ROOT="$HOME/.pyenv"
          [[ -d $PYENV_ROOT/bin ]] && export PATH="$PYENV_ROOT/bin:$PATH"
          eval "$(pyenv init - zsh)"
    3. Restart your terminal and run pyenv install 3.11

  2. Install FFMPEG and Ollama and espeak
    • Run brew install ffmpeg for video editing
    • Run brew install espeak for some AI Text-To-Speech
    • Install Ollama from their site https://ollama.com/download
  3. Run ollama pull gemma3:12b and ollama pull nomic-embed-text
  4. Run python3 -m venv ./venv && source ./venv/bin/activate
    • Every time you start a new terminal for the project run source ./venv/bin/activate
    • Python Environments will run that for you
  5. Run pip3 install -r requirements.txt
  6. Run python3 ./describe_video.py --input ./my_video.mp4 --output ./my_video_script.txt to generate a video script file
  7. Then run python3 ./process_video.py --input_video ./my_video.mp4 --input_text ./my_video_script.txt --output ./my_video_audio_description.mp4

Pylint

To run pylint, runpylint ./describe_video.py. Pylint configuration is located in ./pyproject.toml

About

WIP Audio Description Generation CLI tool written in Python.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages