wget mp3 files?
Kevin Brown
kevin_brown at qwest.net
Sat Sep 24 22:57:36 MST 2005
> I cannot find a way to get some mp3 files from a website. That sounds
> strange so let me set the stage:
>
> Aside: These files are perfectly legitimate. In fact, the provider
> wants people to download them. They are audio, not music.
>
> 1. There is a web page that displays links to the files. The page URL
> is something like:
> http://www.somesite/mp3/display/0,18692,5297-41,00.html. It is
> obviously from a CMS system of some kind.
>
> 2. The above index page has links to .zip files, which I don't want.
>
> 3. The above index page has links to .mp3 files which I do want.
>
> 4. The mp3 files are stored at another URL like:
> http://audio.somesite/Handheld/Books/ThisBook/Part1Chapter1.mp3.
> (Unlike what is implied in my example, the mp3 filenames don't lend
> themselves to easy scripting since they don't follow a consistent pattern.)
>
> 5. Attempts to view any subset of the URL in item 4 (ie
> http://audio.somesite/Handheld/Books) result in 403 Forbidden.
>
> Attempts:
>
> 'wget -r -l 1 -A.mp3 -np
> http://www.somesite/mp3/display/0,18692,5297-41,00.htm' results in save
> the referenced page but not any linked files. '-l 2' same result.
>
> 'wget -r -l 1 -A.mp3 -np http://audio.somesite/Handheld/Books/ThisBook'
> results in the 403 response.
>
> I want to get all the files but I don't feel like clicking and saving
> each of the 230+ mp3 file links. Any thoughts?
save the index page
grep for the mp3 links
search and replace all text leading up to the url and after the url
sed -e 's/^.*<a href="//g;s/".*$//g'
resulting lines should just be the urls
for p in `cat file`
do
wget $p
done
grep and sed can be used in conjunction by piping one into the other
grep "<a href" index | sed.... > file
More information about the PLUG-discuss
mailing list