Web Scraping
Joshua Zeidner
jjzeidner at gmail.com
Wed Oct 11 12:08:02 MST 2006
Hans and Darrin,
I did take look at Mechanize and I think I will go with it. It seems like
the most commonly recommended tool for this task, and Perl is the best
language for this kind of thing. Wget will get me the raw HTML, but
Mechanize lets me complete HTML forms, follow links by name, etc. Thanks to
you both, jmz
On 10/10/06, der.hans <PLUGd at lufthans.com> wrote:
>
> Am 10. Oct, 2006 schwätzte Joshua Zeidner so:
>
> > Does anyone here have any experience developing Web Scraping
> > applications? any suggestions for tools? I can manage Perl, PHP, Java,
> > Python, and C/C++. Recommendations are welcome.
>
> moin moin Josh,
>
> I used WWW::Mechanize for a project and was quite happy with it. It
> doesn't handle javascript, but none of the projects I found did. For the
> project I had to first login, then scrape multiple pages in order to get
> the data I wanted.
>
> I believe there's a Firefox module that allows you to automate replaying
> mouse/keyboard events. That's likely region-based rather than parse-based,
> but it might handle javascript.
>
> libwww-mechanize-perl - Automate interaction with websites
>
> ciao,
>
> der.hans
> --
> # https://www.LuftHans.com/ http://www.CiscoLearning.org/
> # Join the League of Professional System Administrators!
> https://LOPSA.org/
> # "If I want my children to work hard, I better be the hardest working
> # person they've ever met. If I want the children to be nice, I better
> # be the kindest human being they've ever met." -- Rafe Esquith
>
> ---------------------------------------------------
> PLUG-discuss mailing list - PLUG-discuss at lists.plug.phoenix.az.us
> To subscribe, unsubscribe, or to change you mail settings:
> http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss
>
>
--
.0000. communication.
.0001. development.
.0010. strategy.
.0100. appeal.
JOSHUA M. ZEIDNER
IT Consultant
++power; ++perspective; ++possibilities;
( 602 ) 490 8006
jjzeidner at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.PLUG.phoenix.az.us/pipermail/plug-discuss/attachments/20061011/a4ee4cdd/attachment.htm
More information about the PLUG-discuss
mailing list