Not a biggie but in case it helps anyone.
I wanted to download all episodes of the excellent “My Dad Wrote a Porno” podcast for posterity. I couldn’t find any way of doing this so here’s what I ended up doing.
First I found the RSS feed. I noticed that it contains the actual audio file in enclosure tags.
Cool, so I just need to read these for a start. I can do that via curl
.
1 |
curl -s http://rss.acast.com/mydadwroteaporno | grep -o '<enclosure url="[^"]*' |
This gives me all the links thus:
1 2 3 |
<enclosure url="https://media.acast.com/mydadwroteaporno/bestofbookfour/media.mp3 <enclosure url="https://media.acast.com/mydadwroteaporno/mydadwroteachristmasporno3/media.mp3 <enclosure url="https://media.acast.com/mydadwroteaporno/mydadwroteachristmasporno1/media.mp3 |
I was able to extract just the URL via a modification to the above snippet to match the beginning double quotes:
1 |
curl -s http://rss.acast.com/mydadwroteaporno | grep -o '<enclosure url="[^"]*' | grep -o '[^"]*$' |
Now all I needed to do was download these and also rename the “media.mp3” to be the directory name from the path. The following did that:
1 2 3 4 5 |
for i in $(curl -s http://rss.acast.com/mydadwroteaporno | grep -o '<enclosure url="[^"]*' | grep -o '[^"]*$'); do url=$i outfile=$(echo $i | sed 's|https://media\.acast\.com/mydadwroteaporno/||' | sed 's|/media||') wget -q $url -O $outfile done |
I use sed to strip out the domain name and also do the word “media”. What remains is the part of the path I am interested in.