These days I spend way too much time in my car. And I’d like to make that time at least a bit more useful. So I listen to podcasts. However, downloading all these podcasts can also take a lot of time, especially when you want to download an entire archive. Using Linux, it only takes one simple little script to do it.
for i in $( curl -s <a href="http://www.nature.com/nature/podcast/archive.html">http://www.nature.com/nature/podcast/archive.html</a> | grep -oie "http.*\.mp3" | sort | uniq ); do echo "${i}:" ; curl -o ${i##*/} -# ${i}; done
Ok, so it isn’t that simple. Here’s a breakdown of what it does:
for i in $( <list> ); do <commands> ; done
This is the basic for-each loop. For each item in <list>, do <command>.
curl -s <a href="http://www.nature.com/nature/podcast/archive.html\">http://www.nature.com/nature/podcast/archive.html</a>
Retrieve the page (in this case http://www.nature.com/nature/podcast/archive.html) silently. However, it will still output the page to the standard output stream, which is what we want.
grep -oie "http.*\.mp3"
Match the pattern starting with “http”, and ending with “.mp3”. The -o switch will only show the part matching the pattern, the -i will ignore the case, and the -e indicates the pattern will follow.
sort | uniq
These two are combined, because they are basically self explanatory: “sort” sorts the list, “uniq” will remove duplicates from the list.
echo "${i}:"
This just prints the URL of the file to retrieve.
curl -o ${i##*/} -# ${i}
This will retrieve the URL, specified by ‘${i}’. The ‘-#’ means it will display a progress bar. And lastly, it will write the retrieved file to disk. The filename, indicated by ‘-o’, will be the same as the last part of the URL. This means ‘http://www.nature.com/multimedia/podcast/nature/v503/n7477/nature-2013-11-28.mp3′ will be translated to ‘nature-2013-11-28.mp3‘
And the end result will be about 400 podcast episodes to listen to. That will keep you busy for a while…