Extracting, Transcribing, and Summarising Audio from Vimeo

In a previous Nix post I mentioned how you can run once-off commands with Nix without polluting your system environment.

First, the why. My community group hosts regular townhall meetings that I often can’t attend in person. Fortunately, recordings get posted to Vimeo afterwards. Rather than watching hour-long videos, I wanted a faster way to stay informed - hence this workflow.

In this post, I want to demonstrate how you can extract audio from a Vimeo video, transcribe it to text, and generate a summary - all using nix run.

Prerequisites

Nix package manager with flakes enabled

Step 1: Extract Audio from Vimeo

Use yt-dlp to download and extract audio from a Vimeo video.

nix run nixpkgs#yt-dlp -- -x --audio-format mp3 "https://vimeo.com/VIDEO_ID/HASH"

Flags:

-x or --extract-audio: Extract audio only
--audio-format mp3: Convert to MP3 format

Optional flags:

--extractor-retries 5: Retry on transient errors (e.g., 502 Bad Gateway)

This will create an MP3 file named after the video title.

Step 2: Check Audio Duration (Optional)

To verify the audio file and check its runtime, you can use ffmpeg.

nix run nixpkgs#ffmpeg -- -i "your-audio-file.mp3" 2>&1 | grep Duration

Step 3: Transcribe Audio

Use OpenAI’s Whisper model. This runs locally, so no API key is required.

nix run nixpkgs#openai-whisper -- --model tiny --output_format txt "your-audio-file.mp3"

Model options (speed vs accuracy trade-off):

tiny - Fastest, least accurate (~1GB)
base - Good balance (~1GB)
small - Better accuracy (~2GB)
medium - High accuracy (~5GB)
large - Best accuracy, slowest (~10GB)

Other output formats:

txt - Plain text
srt - Subtitles with timestamps
vtt - WebVTT subtitles
json - Detailed JSON with word-level timestamps

The transcription will be saved alongside the audio file with the corresponding extension.

Step 4: Summarise

Once you have the text transcription, you can:

Read and summarise manually
Use an LLM (like Claude) to generate a summary
Use any text summarisation tool of your choice

Example Workflow

# 1. Extract audio
nix run nixpkgs#yt-dlp -- -x --audio-format mp3 "https://vimeo.com/1162513409/cd293376cf"

# 2. Check duration
nix run nixpkgs#ffmpeg -- -i "VIDEO_TITLE.mp3" 2>&1 | grep Duration

# 3. Transcribe (using tiny model for speed)
nix run nixpkgs#openai-whisper -- --model tiny --output_format txt "VIDEO_TITLE.mp3"

# 4. View transcription
cat "VIDEO_TITLE.txt"

Notes

First run will download the required Nix packages (cached for subsequent runs)
Whisper will download the model on first use (one-time download)
For longer audio files (30+ minutes), consider using the tiny or base model to reduce processing time
Whisper runs entirely locally - no internet connection or API keys required for transcription

Conclusion

The Nix ecosystem makes it trivial to chain together tools like yt-dlp, ffmpeg, and whisper without polluting your system environment.

Till next time.

Prerequisites#

Step 1: Extract Audio from Vimeo#

Step 2: Check Audio Duration (Optional)#

Step 3: Transcribe Audio#

Step 4: Summarise#

Example Workflow#

Notes#

Conclusion#