you guys are GOOD! *wish I could do that.* :-)~MIKE~(-: On Tue, Jan 27, 2015 at 8:39 PM, James Dugger wrote: > ​After I got home I thought I could improve on the script. The following > script pulls down the urls and passes them through a while loop that > reduces the name of the url down to the name of the .jpg given in front of > the query string. There are a lot of things that could be refactored to > clean it up but it works: > > > #1/usr/env/bin/ bash > > # Crawl the site, build the url list, and pass it into the variable url. > url=$(wget --spider --force-html -r -l2 " > http://sites.google.com/site/thebookofgimp/home/chapter-2-photograph-retouching/" > 2>&1 | grep '^--' | awk '{ print $3 }') > > #Set how many characters in front of ".jpg" to start building the name > string. > front_int=6 > > printf %s "$url" | while IFS= read -r raw_url > do > #Cut the string a the characters ".jpg" > pos=${raw_url%%.jpg?*} > > #determine the character position of the cut string. > pos_int=$((${#pos})) > > #reduce the number by the value of $front_int. > (( pos_int -= $front_int)) > > #build a new string based on the pos the range provided by front_int > temp_name=${raw_url:pos_int:front_int} > > #Clean up the image name. > image_name=$(echo "$temp_name" | sed 's#.*/\(.*\)$#\1#g') > > #get the images. > wget -O "${image_name}.jpg" "$raw_url" > done > > > On Tue, Jan 27, 2015 at 2:48 PM, Todd Millecam wrote: > >> alright, you got the 20 second, the 2 minute, and now the 20 minute help >> solution: >> >> Open a terminal and do the following: >> >> cd /tmp >> mkdir images >> cd images >> wget --spider -r -l2 -A jpg >> http://sites.google.com/site/thebookofgimp/home/chapter-2-photograph-retouching/ >> 2>&1 | grep '^--' | awk '{print $3}' > imagesList.txt >> for url in `cat imagesList.txt` ; do wget -O `date +%s%N`.jpg $url ; done >> >> That'll download all the images. . .and some crap. Look through it and >> delete the crap out of /tmp/images and you're done. >> >> >> Explanation for those who want to improve their linux terminal fu: >> >> make a temporary directory inside of tmp for cleanliness. >> >> Now, the wget command is confusing. --spider and -r mean "just grab >> urls, don't download anything, but look through and get every url you can >> find from the following location. -l2 means to just look two directories >> deep, and -A jpg means to only print out stuff that includes "jpg" in the >> url. >> >> From there, it's the big long URL, and then some fun little unix specific >> stuff. >> >> 2>&1 is an output redirect. 0 is standard in, 1 is standard out, and 2 >> is standard error. wget is a weird program and prints everything to >> standard error by default, so this makes it move all the data from standard >> error to standard out. We want it on standard out so we can send the >> output of wget (all our URLs) to a different program to filter through >> them. You see, wget isn't giving us clean urls, it's giving us some crap >> output lines, and the useful output lines come in a string like: >> >> >> --2015-01-27 14:38:33-- >> https://e572cad7-a-62cb3a1a-s-sites.googlegroups.com/site/thebookofgimp/home/chapter-2-photograph-retouching/2.100.jpg?attachauth=ANoY7cqIGpuUDdagEljYRFF7WMX2G3rAxez0XLIAOW9cXpAnjqilN4X2HyaRWIblk29ORjgMg28jrQuQmBisXSw0d3gYh912nr4DtRyT5Jqk0KVEfJRqC2u92vG7TlxK75odZ1uWVaUrpEvUw1A52TZbuU7Dju7DIPQzou3dskyDSRrh0VAPHrI-znqeKeJ7NuzJqEc8WcLl4MnUpO-dgUZB7i8Eq_z3FFstaXyhjQGcbht8xZ0cBPFvBgw2gWYhuDQ4lqDHJSru&attredirects=0 >> >> >> we don't care about the ---- and want to get rid of it, so we have >> to filter down. That's why we do the output redirect first, so we can use >> some Linux filter programs, specifically grep and awk. >> >> grep is a regular expression tool, which means it's a very powerful way >> to find text. The regular expression I wanted to pass in was '^--' which >> means: find all lines that start with the characters --. awk will take >> regular expressions too, so I could change the command to look like: >> . . .2>&1 | awk '$0 ~/^--/ {print $3}' > imagesList.txt >> and that would work too. >> >> The coolest description of awk I ever got was "basically excel with no >> gui" >> awk splits all your text up into fields--the default dividing character >> is a space, so if I want the first thing in the line, I use a $1 to say get >> up to the first space. There's a space in the date here, so the actual URL >> is in field 3 which is why I tell awk to execute the command {print $3} >> I could also say, "get the last field in the line" since that's the URL >> too by using a $NF >> >> The last bit, the > imagesList.txt says to make or overwrite the file >> named imagesList.txt with whatever awk outputs (which is our filtered urls). >> >> The last line is: >> for url in `cat imagesList.txt` ; do wget -O `date +%s%N`.jpg $url ; done >> >> this is saying, give me the text on each line in imagesList.txt, and >> store them in the variable $url, and then execute the command group between >> do and done until we've gone through every line in the file. >> >> The command between do and done is our regular old download a file with >> wget, with one small modification: >> >> wget -O `date +%s%N`.jpg >> >> Anything in back-ticks (the ` character right next to 1 on most >> keyboards) is an encapsulated command, and everything inside the back-ticks >> will be executed as a command. Well, the command date +%s%N means give me >> the current time in nanoseconds. So, each time wget is run, it'll rename >> the download file to the current time in nanoseconds.jpg and then the for >> loop takes over and grabs the next one. >> >> >> >> >> >> On Tue, Jan 27, 2015 at 2:17 PM, Stephen Partington > > wrote: >> >>> you can write a script to yank out the jpg links. or just use something >>> like https://addons.mozilla.org/en-US/firefox/addon/downthemall/ >>> >>> On Tue, Jan 27, 2015 at 12:12 PM, Michael Havens >>> wrote: >>> >>>> H0w can I us wget to retrieve the photos here >>>> . >>>> I tried: >>>> >>>> wget -r >>>> http://the-book-of-gimp.blogspot.com/p/chapter-2-photograph-retouching.html >>>> >>>> but it didn't download the pictures. It downloaded a bunch of web pages. >>>> :-)~MIKE~(-: >>>> >>>> --------------------------------------------------- >>>> PLUG-discuss mailing list - PLUG-discuss@lists.phxlinux.org >>>> To subscribe, unsubscribe, or to change your mail settings: >>>> http://lists.phxlinux.org/mailman/listinfo/plug-discuss >>>> >>> >>> >>> >>> -- >>> A mouse trap, placed on top of your alarm clock, will prevent you from >>> rolling over and going back to sleep after you hit the snooze button. >>> >>> Stephen >>> >>> >>> --------------------------------------------------- >>> PLUG-discuss mailing list - PLUG-discuss@lists.phxlinux.org >>> To subscribe, unsubscribe, or to change your mail settings: >>> http://lists.phxlinux.org/mailman/listinfo/plug-discuss >>> >> >> >> >> -- >> Todd Millecam >> >> --------------------------------------------------- >> PLUG-discuss mailing list - PLUG-discuss@lists.phxlinux.org >> To subscribe, unsubscribe, or to change your mail settings: >> http://lists.phxlinux.org/mailman/listinfo/plug-discuss >> > > > > -- > James > > *Linkedin * > > --------------------------------------------------- > PLUG-discuss mailing list - PLUG-discuss@lists.phxlinux.org > To subscribe, unsubscribe, or to change your mail settings: > http://lists.phxlinux.org/mailman/listinfo/plug-discuss >