Add docs/tech_docs/FFmpeg.md
This commit is contained in:
78
docs/tech_docs/FFmpeg.md
Normal file
78
docs/tech_docs/FFmpeg.md
Normal file
@@ -0,0 +1,78 @@
|
||||
### Extracting Audio from Video with FFmpeg
|
||||
|
||||
First, you'll extract the audio from your video file into a `.wav` format suitable for speech recognition:
|
||||
|
||||
1. **Open your terminal.**
|
||||
|
||||
2. **Run the FFmpeg command to extract audio:**
|
||||
```bash
|
||||
ffmpeg -i input_video.mp4 -vn -acodec pcm_s16le -ar 16000 -ac 1 output_audio.wav
|
||||
```
|
||||
- Replace `input_video.mp4` with the path to your video file.
|
||||
- The output will be a `.wav` file named `output_audio.wav`.
|
||||
|
||||
### Setting Up the Python Virtual Environment and DeepSpeech
|
||||
|
||||
Next, prepare your environment for running DeepSpeech:
|
||||
|
||||
1. **Update your package list (optional but recommended):**
|
||||
```bash
|
||||
sudo apt update
|
||||
```
|
||||
|
||||
2. **Install Python3-venv if you haven't already:**
|
||||
```bash
|
||||
sudo apt install python3-venv
|
||||
```
|
||||
|
||||
3. **Create a Python virtual environment:**
|
||||
```bash
|
||||
python3 -m venv deepspeech-venv
|
||||
```
|
||||
|
||||
4. **Activate the virtual environment:**
|
||||
```bash
|
||||
source deepspeech-venv/bin/activate
|
||||
```
|
||||
|
||||
### Installing DeepSpeech
|
||||
|
||||
With your virtual environment active, install DeepSpeech:
|
||||
|
||||
1. **Install DeepSpeech within the virtual environment:**
|
||||
```bash
|
||||
pip install deepspeech
|
||||
```
|
||||
|
||||
### Downloading DeepSpeech Pre-trained Models
|
||||
|
||||
Before transcribing, you need the pre-trained model files:
|
||||
|
||||
1. **Download the pre-trained DeepSpeech model and scorer files from the [DeepSpeech GitHub releases page](https://github.com/mozilla/DeepSpeech/releases).** Look for files named similarly to `deepspeech-0.9.3-models.pbmm` and `deepspeech-0.9.3-models.scorer`.
|
||||
|
||||
2. **Place the downloaded files in a directory where you plan to run the transcription, or note their paths for use in the transcription command.**
|
||||
|
||||
### Transcribing Audio to Text
|
||||
|
||||
Finally, you're ready to transcribe the audio file to text:
|
||||
|
||||
1. **Ensure you're in the directory containing both the audio file (`output_audio.wav`) and the DeepSpeech model files, or have their paths noted.**
|
||||
|
||||
2. **Run DeepSpeech with the following command:**
|
||||
```bash
|
||||
deepspeech --model deepspeech-0.9.3-models.pbmm --scorer deepspeech-0.9.3-models.scorer --audio output_audio.wav
|
||||
```
|
||||
- Replace `deepspeech-0.9.3-models.pbmm` and `deepspeech-0.9.3-models.scorer` with the paths to your downloaded model and scorer files, if they're not in the current directory.
|
||||
- Replace `output_audio.wav` with the path to your `.wav` audio file if necessary.
|
||||
|
||||
This command will output the transcription of your audio file directly in the terminal. The transcription process might take some time depending on the length of your audio file and the capabilities of your machine.
|
||||
|
||||
### Deactivating the Virtual Environment
|
||||
|
||||
After you're done, you can deactivate the virtual environment:
|
||||
|
||||
```bash
|
||||
deactivate
|
||||
```
|
||||
|
||||
This guide provides a streamlined process for extracting audio from video files and transcribing it to text using DeepSpeech on Debian-based Linux systems. It's a handy reference for tasks involving speech recognition and transcription.
|
||||
Reference in New Issue
Block a user