Files
the_information_nexus/tech_docs/linux/FFmpeg.md
2024-05-01 12:28:44 -06:00

2.9 KiB

Extracting Audio from Video with FFmpeg

First, you'll extract the audio from your video file into a .wav format suitable for speech recognition:

  1. Open your terminal.

  2. Run the FFmpeg command to extract audio:

    ffmpeg -i input_video.mp4 -vn -acodec pcm_s16le -ar 16000 -ac 1 output_audio.wav
    
    • Replace input_video.mp4 with the path to your video file.
    • The output will be a .wav file named output_audio.wav.

Setting Up the Python Virtual Environment and DeepSpeech

Next, prepare your environment for running DeepSpeech:

  1. Update your package list (optional but recommended):

    sudo apt update
    
  2. Install Python3-venv if you haven't already:

    sudo apt install python3-venv
    
  3. Create a Python virtual environment:

    python3 -m venv deepspeech-venv
    
  4. Activate the virtual environment:

    source deepspeech-venv/bin/activate
    

Installing DeepSpeech

With your virtual environment active, install DeepSpeech:

  1. Install DeepSpeech within the virtual environment:
    pip install deepspeech
    

Downloading DeepSpeech Pre-trained Models

Before transcribing, you need the pre-trained model files:

  1. Download the pre-trained DeepSpeech model and scorer files from the DeepSpeech GitHub releases page. Look for files named similarly to deepspeech-0.9.3-models.pbmm and deepspeech-0.9.3-models.scorer.

  2. Place the downloaded files in a directory where you plan to run the transcription, or note their paths for use in the transcription command.

Transcribing Audio to Text

Finally, you're ready to transcribe the audio file to text:

  1. Ensure you're in the directory containing both the audio file (output_audio.wav) and the DeepSpeech model files, or have their paths noted.

  2. Run DeepSpeech with the following command:

    deepspeech --model deepspeech-0.9.3-models.pbmm --scorer deepspeech-0.9.3-models.scorer --audio output_audio.wav
    
    • Replace deepspeech-0.9.3-models.pbmm and deepspeech-0.9.3-models.scorer with the paths to your downloaded model and scorer files, if they're not in the current directory.
    • Replace output_audio.wav with the path to your .wav audio file if necessary.

This command will output the transcription of your audio file directly in the terminal. The transcription process might take some time depending on the length of your audio file and the capabilities of your machine.

Deactivating the Virtual Environment

After you're done, you can deactivate the virtual environment:

deactivate

This guide provides a streamlined process for extracting audio from video files and transcribing it to text using DeepSpeech on Debian-based Linux systems. It's a handy reference for tasks involving speech recognition and transcription.