Am 23. May, 2005 schwätzte Josef Lowder so:
> Anyone know how I could execute 'lynx {url.address]' in a shell script
> and capture the output to a file as a background task?
>
> For example, let's say I wanted to collect the latest listings of some
> specific item from the Arizona Republic classified section and send
> the results to a text file. I have tried various scripts to do this, but
> haven't been able to get one to work yet.
As pointed out, use -dump to get only the presented text of the web page.
Anyone know if it handles javascript created text?
Use -source to get the HTML.
Redirect to file using '>'.
wget or curl might be the tools you want to use, though.
I like --force-directories for wget to keep things in a nice tree.
Anyone know of an option to wget to get it to report the filename(s) that
it'll use?
Use -O for wget to specify an output file. Normally wget uses the same
filename as the file you're downloading. That results in index.html
getting overwritten a lot if you're not careful :).
BTW, elinks is a better, IMO, text-based web browser.
If you're needing to parse a bunch of pages in order to find links to
follow use a screen scraper. wget and curl have options for that. There
are also libs in various languages. curl is plugged into PHP. Perl has
WWW::Mechanize.
Anyone know of a screen scraper that can handle javascript?
ciao,
der.hans
--
# https://www.LuftHans.com/ http://www.AZOTO.org/
# HERE LIES LESTER MOORE
# SHOT 4 TIMES WITH A .44
# NO LES
# NO MOORE
# -- tombstone, in Tombstone, AZ
---------------------------------------------------
PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
To subscribe, unsubscribe, or to change you mail settings:
http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss