wget mp3 files?

Kevin Brown kevin_brown at qwest.net
Sat Sep 24 22:57:36 MST 2005


> I cannot find a way to get some mp3 files from a website.  That sounds
> strange so let me set the stage:
> 
> Aside: These files are perfectly legitimate.  In fact, the provider
> wants people to download them.  They are audio, not music.
> 
> 1. There is a web page that displays links to the files.  The page URL
> is something like:
> http://www.somesite/mp3/display/0,18692,5297-41,00.html.  It is
> obviously from a CMS system of some kind.
> 
> 2. The above index page has links to .zip files, which I don't want.
> 
> 3. The above index page has links to .mp3 files which I do want.
> 
> 4. The mp3 files are stored at another URL like:
> http://audio.somesite/Handheld/Books/ThisBook/Part1Chapter1.mp3.
> (Unlike what is implied in my example, the mp3 filenames don't lend
> themselves to easy scripting since they don't follow a consistent pattern.)
> 
> 5. Attempts to view any subset of the URL in item 4 (ie
> http://audio.somesite/Handheld/Books) result in 403 Forbidden.
> 
> Attempts:
> 
> 'wget -r -l 1 -A.mp3 -np
> http://www.somesite/mp3/display/0,18692,5297-41,00.htm' results in save
> the referenced page but not any linked files.  '-l 2' same result.
> 
> 'wget -r -l 1 -A.mp3 -np http://audio.somesite/Handheld/Books/ThisBook'
> results in the 403 response.
> 
> I want to get all the files but I don't feel like clicking and saving
> each of the 230+ mp3 file links.  Any thoughts?

save the index page
grep for the mp3 links
search and replace all text leading up to the url and after the url
sed -e 's/^.*<a href="//g;s/".*$//g'
resulting lines should just be the urls
for p in `cat file`
do
wget $p
done

grep and sed can be used in conjunction by piping one into the other
grep "<a href" index | sed.... > file


More information about the PLUG-discuss mailing list