Hans and Darrin,
I did take look at Mechanize and I think I will go with it. It seems like the most commonly recommended tool for this task, and Perl is the best language for this kind of thing. Wget will get me the raw HTML, but Mechanize lets me complete HTML forms, follow links by name, etc. Thanks to you both, jmz
Am 10. Oct, 2006 schwätzte Joshua Zeidner so:
> Does anyone here have any experience developing Web Scraping
> applications? any suggestions for tools? I can manage Perl, PHP, Java,
> Python, and C/C++. Recommendations are welcome.
moin moin Josh,
I used WWW::Mechanize for a project and was quite happy with it. It
doesn't handle javascript, but none of the projects I found did. For the
project I had to first login, then scrape multiple pages in order to get
the data I wanted.
I believe there's a Firefox module that allows you to automate replaying
mouse/keyboard events. That's likely region-based rather than parse-based,
but it might handle javascript.
libwww-mechanize-perl - Automate interaction with websites
ciao,
der.hans
--
# https://www.LuftHans.com/ http://www.CiscoLearning.org/
# Join the League of Professional System Administrators! https://LOPSA.org/
# "If I want my children to work hard, I better be the hardest working
# person they've ever met. If I want the children to be nice, I better
# be the kindest human being they've ever met." -- Rafe Esquith
---------------------------------------------------
PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
To subscribe, unsubscribe, or to change you mail settings:
http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss