Downloading an entire podcast archive


These days I spend way too much time in my car. And I’d like to make that time at least a bit more useful. So I listen to podcasts. However, downloading all these podcasts can also take a lot of time, especially when you want to download an entire archive. Using Linux, it only takes one simple little script to do it.

for i in $( curl -s <a href="http://www.nature.com/nature/podcast/archive.html">http://www.nature.com/nature/podcast/archive.html</a> |
grep -oie "http.*\.mp3" |
sort |
uniq );
do
echo "${i}:" ;
curl -o ${i##*/} -# ${i};
done

Download

Ok, so it isn’t that simple. Here’s a breakdown of what it does:

for i in $( <list> ); do <commands> ; done

This is the basic for-each loop. For each item in <list>, do <command>.

curl -s <a href="http://www.nature.com/nature/podcast/archive.html\">http://www.nature.com/nature/podcast/archive.html</a>

Retrieve the page (in this case http://www.nature.com/nature/podcast/archive.html) silently. However, it will still output the page to the standard output stream, which is what we want.

grep -oie "http.*\.mp3"

Match the pattern starting with “http”, and ending with “.mp3”. The -o switch will only show the part matching the pattern, the -i will ignore the case, and the -e indicates the pattern will follow.

sort | uniq

These two are combined, because they are basically self explanatory: “sort” sorts the list, “uniq” will remove duplicates from the list.

echo "${i}:"

This just prints the URL of the file to retrieve.

curl -o ${i##*/} -# ${i}

This will retrieve the URL, specified by ‘${i}’. The ‘-#’ means it will display a progress bar. And lastly, it will write the retrieved file to disk. The filename, indicated by ‘-o’, will be the same as the last part of the URL. This means ‘http://www.nature.com/multimedia/podcast/nature/v503/n7477/nature-2013-11-28.mp3′ will be translated to ‘nature-201podcasts3-11-28.mp3

And the end result will be about 400 podcast episodes to listen to. That will keep you busy for a while…


Geef een reactie

Je e-mailadres wordt niet gepubliceerd. Vereiste velden zijn gemarkeerd met *