Re: How to execute 'lynx {url.address}' in a shell script

Top Page
Attachments:
Message as email
+ (text/plain)
Delete this message
Reply to this message
Author: der.hans
Date:  
To: PLUG
Subject: Re: How to execute 'lynx {url.address}' in a shell script
Am 23. May, 2005 schwätzte Josef Lowder so:

> Anyone know how I could execute 'lynx {url.address]' in a shell script
> and capture the output to a file as a background task?
>
> For example, let's say I wanted to collect the latest listings of some
> specific item from the Arizona Republic classified section and send
> the results to a text file. I have tried various scripts to do this, but
> haven't been able to get one to work yet.


As pointed out, use -dump to get only the presented text of the web page.

Anyone know if it handles javascript created text?

Use -source to get the HTML.

Redirect to file using '>'.

wget or curl might be the tools you want to use, though.

I like --force-directories for wget to keep things in a nice tree.

Anyone know of an option to wget to get it to report the filename(s) that
it'll use?

Use -O for wget to specify an output file. Normally wget uses the same
filename as the file you're downloading. That results in index.html
getting overwritten a lot if you're not careful :).

BTW, elinks is a better, IMO, text-based web browser.

If you're needing to parse a bunch of pages in order to find links to
follow use a screen scraper. wget and curl have options for that. There
are also libs in various languages. curl is plugged into PHP. Perl has
WWW::Mechanize.

Anyone know of a screen scraper that can handle javascript?

ciao,

der.hans
-- 
#  https://www.LuftHans.com/    http://www.AZOTO.org/
#  HERE LIES LESTER MOORE
#  SHOT 4 TIMES WITH A .44
#  NO LES
#  NO MOORE
#        -- tombstone, in Tombstone, AZ
---------------------------------------------------
PLUG-discuss mailing list - 
To subscribe, unsubscribe, or to change  you mail settings:
http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss