Re: Web Scraping

Top Page
Attachments:
Message as email
+ (text/plain)
+ (text/plain)
Delete this message
Reply to this message
Author: der.hans
Date:  
To: Main PLUG discussion list
Subject: Re: Web Scraping
Am 10. Oct, 2006 schwätzte Joshua Zeidner so:

> Does anyone here have any experience developing Web Scraping
> applications? any suggestions for tools? I can manage Perl, PHP, Java,
> Python, and C/C++. Recommendations are welcome.


moin moin Josh,

I used WWW::Mechanize for a project and was quite happy with it. It
doesn't handle javascript, but none of the projects I found did. For the
project I had to first login, then scrape multiple pages in order to get
the data I wanted.

I believe there's a Firefox module that allows you to automate replaying
mouse/keyboard events. That's likely region-based rather than parse-based,
but it might handle javascript.

libwww-mechanize-perl - Automate interaction with websites

ciao,

der.hans
-- 
#  https://www.LuftHans.com/        http://www.CiscoLearning.org/
#  Join the League of Professional System Administrators! https://LOPSA.org/
#  "If I want my children to work hard, I better be the hardest working
#  person they've ever met. If I want the children to be nice, I better
#  be the kindest human being they've ever met." -- Rafe Esquith
---------------------------------------------------
PLUG-discuss mailing list -
To subscribe, unsubscribe, or to change you mail settings:
http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss