<div dir="ltr">you guys are GOOD! <i>wish I could do that.</i></div><div class="gmail_extra"><br clear="all"><div><div class="gmail_signature">:-)~MIKE~(-:</div></div>
<br><div class="gmail_quote">On Tue, Jan 27, 2015 at 8:39 PM, James Dugger <span dir="ltr"><<a href="mailto:james.dugger@gmail.com" target="_blank">james.dugger@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small;color:rgb(11,83,148)">After I got home I thought I could improve on the script. The following script pulls down the urls and passes them through a while loop that reduces the name of the url down to the name of the .jpg given in front of the query string. There are a lot of things that could be refactored to clean it up but it works: </div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small;color:rgb(11,83,148)"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small;color:rgb(11,83,148)"><br></div><div class="gmail_default"><div class="gmail_default" style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small"><div class="gmail_default"><font color="#0b5394" face="arial, helvetica, sans-serif">#1/usr/env/bin/ bash </font></div></div><div class="gmail_default" style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small"><div class="gmail_default"><font color="#0b5394" face="arial, helvetica, sans-serif"><br></font></div></div><div class="gmail_default" style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small"><div class="gmail_default"><font color="#0b5394" face="arial, helvetica, sans-serif"># Crawl the site, build the url list, and pass it into the variable url.</font></div></div><div class="gmail_default" style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small"><div class="gmail_default"><font color="#0b5394" face="arial, helvetica, sans-serif">url=$(wget --spider --force-html -r -l2 "<a href="http://sites.google.com/site/thebookofgimp/home/chapter-2-photograph-retouching/" target="_blank">http://sites.google.com/site/thebookofgimp/home/chapter-2-photograph-retouching/</a>" 2>&1 | grep '^--' | awk '{ print $3 }')</font></div></div><div class="gmail_default" style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small"><div class="gmail_default"><font color="#0b5394" face="arial, helvetica, sans-serif"><br></font></div></div><div class="gmail_default"><div class="gmail_default"><font color="#0b5394" face="arial, helvetica, sans-serif"><font color="#222222" face="arial, sans-serif">#Set how many characters in front of ".jpg" to start </font>building<font color="#222222" face="arial, sans-serif"> the name string.</font></font></div></div><div class="gmail_default" style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small"><div class="gmail_default"><font color="#0b5394" face="arial, helvetica, sans-serif">front_int=6</font></div></div><div class="gmail_default" style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small"><div class="gmail_default"><font color="#0b5394" face="arial, helvetica, sans-serif"><br></font></div></div><div class="gmail_default" style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small"><div class="gmail_default"><font color="#0b5394" face="arial, helvetica, sans-serif">printf %s "$url" | while IFS= read -r raw_url</font></div></div><div class="gmail_default" style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small"><div class="gmail_default"><font color="#0b5394" face="arial, helvetica, sans-serif"> do</font></div></div><div class="gmail_default" style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small"><div class="gmail_default"><font color="#0b5394" face="arial, helvetica, sans-serif"> #Cut the string a the characters ".jpg"</font></div></div><div class="gmail_default" style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small"><div class="gmail_default"><font color="#0b5394" face="arial, helvetica, sans-serif"> pos=${raw_url%%.jpg?*}</font></div></div><div class="gmail_default" style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small"><div class="gmail_default"><font color="#0b5394" face="arial, helvetica, sans-serif"><br></font></div></div><div class="gmail_default" style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small"><div class="gmail_default"><font color="#0b5394" face="arial, helvetica, sans-serif"> #determine the character position of the cut string.</font></div></div><div class="gmail_default" style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small"><div class="gmail_default"><font color="#0b5394" face="arial, helvetica, sans-serif"> pos_int=$((${#pos}))</font></div></div><div class="gmail_default" style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small"><div class="gmail_default"><font color="#0b5394" face="arial, helvetica, sans-serif"><br></font></div></div><div class="gmail_default" style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small"><div class="gmail_default"><font color="#0b5394" face="arial, helvetica, sans-serif"> #reduce the number by the value of $front_int.</font></div></div><div class="gmail_default" style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small"><div class="gmail_default"><font color="#0b5394" face="arial, helvetica, sans-serif"> (( pos_int -= $front_int))</font></div></div><div class="gmail_default" style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small"><div class="gmail_default"><font color="#0b5394" face="arial, helvetica, sans-serif"><br></font></div></div><div class="gmail_default" style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small"><div class="gmail_default"><font color="#0b5394" face="arial, helvetica, sans-serif"> #build a new string based on the pos the range provided by front_int</font></div></div><div class="gmail_default" style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small"><div class="gmail_default"><font color="#0b5394" face="arial, helvetica, sans-serif"> temp_name=${raw_url:pos_int:front_int}</font></div></div><div class="gmail_default" style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small"><div class="gmail_default"><font color="#0b5394" face="arial, helvetica, sans-serif"><br></font></div></div><div class="gmail_default" style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small"><div class="gmail_default"><font color="#0b5394" face="arial, helvetica, sans-serif"> #Clean up the image name.</font></div></div><div class="gmail_default" style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small"><div class="gmail_default"><font color="#0b5394" face="arial, helvetica, sans-serif"> image_name=$(echo "$temp_name" | sed 's#.*/\(.*\)$#\1#g')</font></div></div><div class="gmail_default" style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small"><div class="gmail_default"><font color="#0b5394" face="arial, helvetica, sans-serif"><br></font></div></div><div class="gmail_default" style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small"><div class="gmail_default"><font color="#0b5394" face="arial, helvetica, sans-serif"> #get the images.</font></div></div><div class="gmail_default" style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small"><div class="gmail_default"><font color="#0b5394" face="arial, helvetica, sans-serif"> wget -O "${image_name}.jpg" "$raw_url"</font></div></div><div class="gmail_default" style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small"><div class="gmail_default"><font color="#0b5394" face="arial, helvetica, sans-serif">done</font></div><div class="gmail_default"><br></div></div></div></div><div class="gmail_extra"><div><div class="h5"><br><div class="gmail_quote">On Tue, Jan 27, 2015 at 2:48 PM, Todd Millecam <span dir="ltr"><<a href="mailto:tyggna@gmail.com" target="_blank">tyggna@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">alright, you got the 20 second, the 2 minute, and now the 20 minute help solution:<br><br>Open a terminal and do the following:<br><br>cd /tmp<br>mkdir images<br>cd images<br>wget --spider -r -l2 -A jpg <a href="http://sites.google.com/site/thebookofgimp/home/chapter-2-photograph-retouching/" target="_blank">http://sites.google.com/site/thebookofgimp/home/chapter-2-photograph-retouching/</a> 2>&1 | grep '^--' | awk '{print $3}' > imagesList.txt<br> for url in `cat imagesList.txt` ; do wget -O `date +%s%N`.jpg $url ; done<br><br>That'll download all the images. . .and some crap. Look through it and delete the crap out of /tmp/images and you're done.<br><br><br>Explanation for those who want to improve their linux terminal fu:<br><br>make a temporary directory inside of tmp for cleanliness.<br><br>Now, the wget command is confusing. --spider and -r mean "just grab urls, don't download anything, but look through and get every url you can find from the following location. -l2 means to just look two directories deep, and -A jpg means to only print out stuff that includes "jpg" in the url. <br><br>From there, it's the big long URL, and then some fun little unix specific stuff.<br><br>2>&1 is an output redirect. 0 is standard in, 1 is standard out, and 2 is standard error. wget is a weird program and prints everything to standard error by default, so this makes it move all the data from standard error to standard out. We want it on standard out so we can send the output of wget (all our URLs) to a different program to filter through them. You see, wget isn't giving us clean urls, it's giving us some crap output lines, and the useful output lines come in a string like:<br><br><br>--2015-01-27 14:38:33-- <a href="https://e572cad7-a-62cb3a1a-s-sites.googlegroups.com/site/thebookofgimp/home/chapter-2-photograph-retouching/2.100.jpg?attachauth=ANoY7cqIGpuUDdagEljYRFF7WMX2G3rAxez0XLIAOW9cXpAnjqilN4X2HyaRWIblk29ORjgMg28jrQuQmBisXSw0d3gYh912nr4DtRyT5Jqk0KVEfJRqC2u92vG7TlxK75odZ1uWVaUrpEvUw1A52TZbuU7Dju7DIPQzou3dskyDSRrh0VAPHrI-znqeKeJ7NuzJqEc8WcLl4MnUpO-dgUZB7i8Eq_z3FFstaXyhjQGcbht8xZ0cBPFvBgw2gWYhuDQ4lqDHJSru&attredirects=0" target="_blank">https://e572cad7-a-62cb3a1a-s-sites.googlegroups.com/site/thebookofgimp/home/chapter-2-photograph-retouching/2.100.jpg?attachauth=ANoY7cqIGpuUDdagEljYRFF7WMX2G3rAxez0XLIAOW9cXpAnjqilN4X2HyaRWIblk29ORjgMg28jrQuQmBisXSw0d3gYh912nr4DtRyT5Jqk0KVEfJRqC2u92vG7TlxK75odZ1uWVaUrpEvUw1A52TZbuU7Dju7DIPQzou3dskyDSRrh0VAPHrI-znqeKeJ7NuzJqEc8WcLl4MnUpO-dgUZB7i8Eq_z3FFstaXyhjQGcbht8xZ0cBPFvBgw2gWYhuDQ4lqDHJSru&attredirects=0</a><br><br><br>we don't care about the --<date >-- and want to get rid of it, so we have to filter down. That's why we do the output redirect first, so we can use some Linux filter programs, specifically grep and awk.<br><br>grep is a regular expression tool, which means it's a very powerful way to find text. The regular expression I wanted to pass in was '^--' which means: find all lines that start with the characters --. awk will take regular expressions too, so I could change the command to look like:<br>. . .2>&1 | awk '$0 ~/^--/ {print $3}' > imagesList.txt<br>and that would work too.<br><br>The coolest description of awk I ever got was "basically excel with no gui"<br>awk splits all your text up into fields--the default dividing character is a space, so if I want the first thing in the line, I use a $1 to say get up to the first space. There's a space in the date here, so the actual URL is in field 3 which is why I tell awk to execute the command {print $3}<br>I could also say, "get the last field in the line" since that's the URL too by using a $NF<br><br>The last bit, the > imagesList.txt says to make or overwrite the file named imagesList.txt with whatever awk outputs (which is our filtered urls).<br><br>The last line is:<br>for url in `cat imagesList.txt` ; do wget -O `date +%s%N`.jpg $url ; done<br><br>this is saying, give me the text on each line in imagesList.txt, and store them in the variable $url, and then execute the command group between do and done until we've gone through every line in the file.<br><br>The command between do and done is our regular old download a file with wget, with one small modification:<br><br>wget -O `date +%s%N`.jpg<br><br>Anything in back-ticks (the ` character right next to 1 on most keyboards) is an encapsulated command, and everything inside the back-ticks will be executed as a command. Well, the command date +%s%N means give me the current time in nanoseconds. So, each time wget is run, it'll rename the download file to the current time in nanoseconds.jpg and then the for loop takes over and grabs the next one.<div><br></div><div><br></div><div><br><br></div></div><div class="gmail_extra"><div><div><br><div class="gmail_quote">On Tue, Jan 27, 2015 at 2:17 PM, Stephen Partington <span dir="ltr"><<a href="mailto:cryptworks@gmail.com" target="_blank">cryptworks@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_default" style="font-family:'trebuchet ms',sans-serif">you can write a script to yank out the jpg links. or just use something like <a href="https://addons.mozilla.org/en-US/firefox/addon/downthemall/" target="_blank">https://addons.mozilla.org/en-US/firefox/addon/downthemall/</a></div></div><div class="gmail_extra"><br><div class="gmail_quote"><span>On Tue, Jan 27, 2015 at 12:12 PM, Michael Havens <span dir="ltr"><<a href="mailto:bmike1@gmail.com" target="_blank">bmike1@gmail.com</a>></span> wrote:<br></span><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div><div dir="ltr">H0w can I us wget to retrieve the photos <a href="http://the-book-of-gimp.blogspot.com/p/chapter-2-photograph-retouching.html" target="_blank">here</a>. I tried:<div><br></div><div><div>wget -r <a href="http://the-book-of-gimp.blogspot.com/p/chapter-2-photograph-retouching.html" target="_blank">http://the-book-of-gimp.blogspot.com/p/chapter-2-photograph-retouching.html</a></div><div><br></div><div>but it didn't download the pictures. It downloaded a bunch of web pages.</div><div><div><div>:-)~MIKE~(-:</div></div>
</div></div></div>
<br></div></div><span>---------------------------------------------------<br>
PLUG-discuss mailing list - <a href="mailto:PLUG-discuss@lists.phxlinux.org" target="_blank">PLUG-discuss@lists.phxlinux.org</a><br>
To subscribe, unsubscribe, or to change your mail settings:<br>
<a href="http://lists.phxlinux.org/mailman/listinfo/plug-discuss" target="_blank">http://lists.phxlinux.org/mailman/listinfo/plug-discuss</a><br></span></blockquote></div><span><font color="#888888"><br><br clear="all"><div><br></div>-- <br><div>A mouse trap, placed on top of your alarm clock, will prevent you from rolling over and going back to sleep after you hit the snooze button.<br><br>Stephen<br><br></div>
</font></span></div>
<br>---------------------------------------------------<br>
PLUG-discuss mailing list - <a href="mailto:PLUG-discuss@lists.phxlinux.org" target="_blank">PLUG-discuss@lists.phxlinux.org</a><br>
To subscribe, unsubscribe, or to change your mail settings:<br>
<a href="http://lists.phxlinux.org/mailman/listinfo/plug-discuss" target="_blank">http://lists.phxlinux.org/mailman/listinfo/plug-discuss</a><br></blockquote></div><br><br clear="all"><div><br></div>-- <br></div></div><span><font color="#888888"><div>Todd Millecam</div>
</font></span></div>
<br>---------------------------------------------------<br>
PLUG-discuss mailing list - <a href="mailto:PLUG-discuss@lists.phxlinux.org" target="_blank">PLUG-discuss@lists.phxlinux.org</a><br>
To subscribe, unsubscribe, or to change your mail settings:<br>
<a href="http://lists.phxlinux.org/mailman/listinfo/plug-discuss" target="_blank">http://lists.phxlinux.org/mailman/listinfo/plug-discuss</a><br></blockquote></div><br><br clear="all"><div><br></div>-- <br></div></div><div><div dir="ltr"><font color="#0b5394">James</font><div><br><span style="color:rgb(255,255,255)"><span style="background-color:rgb(11,83,148)"><b><a href="http://www.linkedin.com/pub/james-h-dugger/15/64b/74a/" target="_blank"><span style="background-color:rgb(255,255,255)"><span></span><span style="color:rgb(11,83,148)">Linkedin<span></span></span></span></a></b></span></span><br></div></div></div>
</div>
<br>---------------------------------------------------<br>
PLUG-discuss mailing list - <a href="mailto:PLUG-discuss@lists.phxlinux.org">PLUG-discuss@lists.phxlinux.org</a><br>
To subscribe, unsubscribe, or to change your mail settings:<br>
<a href="http://lists.phxlinux.org/mailman/listinfo/plug-discuss" target="_blank">http://lists.phxlinux.org/mailman/listinfo/plug-discuss</a><br></blockquote></div><br></div>