Parallelize YouTube downloads
Problem statement
I wanted to download a certain YouTube playlist for offline use, with close to 150 videos in total.
Unfortunately yt-dlp
and YouTube are in sort of a game of
catch-up: YT tries to enforce trickle downloads, and yt-dlp
(and youtube-dl
and others) are trying to bypass.
For a single video, I don’t mind waiting. But for 150 videos, this takes forever and a day:
yt-dlp \
-f 'bestvideo[ext=mp4][vcodec!*=av01]+bestaudio[ext=m4a]/mp4' \
https://www.youtube.com/@ExampleChannelHere/videos
Fortunately, there is a better way.
Solution
The solution is simple:
- Get list of video IDs to download
- Parallel-download them
- Profit
Getting a list of video IDs to download
This is surprisingly easy:
yt-dlp --dump-json \
https://www.youtube.com/@ExampleChannelHere/videos \
| tee vids.json
(why tee
? so you see progress, and don’t ^C
it thinking it’s stuck)
Just the ID list, then is1:
jq -r '[.id]|@csv' < vids.json | sed 's/"//g'
Parallel-download list of videos
With vids.json
downloading the videos in parallel is quite easy.
All you need is jq
, sed
, and xargs
:
jq -r '[.id]|@csv' < vids.json | sed 's/"//g' | \
xargs -n 1 -P 20 -I{} \
/usr/local/bin/yt-dlp \
-f 'bestvideo[ext=mp4][vcodec!*=av01]+bestaudio[ext=m4a]/mp4' -- {}
You could obviously run more than 20
workers in parallel…
but I’m trying not to be a douche about it2.
Also note the flag terminator (--
) omit that and you’ll be sorry,
as many YT videos start with a dash.
Profit
Enough material to watch during those loong winter nights. :-)
Closing words
I re-ran the original yt-dlp
just to make sure my crude parallel
downloader Pokemon’d them reliably. And yes, it surely did. \o/
If there’s will, there’s away. Crude way, sure. But not everything has to be production ready, yes?
Next-up: load-balancing across the IPv6 space I got assigned, to thwart possible per-IP throttling. Nah, too much work.