Re: Web Scraping

Top Page
Attachments:
Message as email
+ (text/plain)
+ (text/html)
+ (text/plain)
Delete this message
Reply to this message
Author: Joshua Zeidner
Date:  
To: Main PLUG discussion list
Subject: Re: Web Scraping
Hans and Darrin,

I did take look at Mechanize and I think I will go with it. It seems like
the most commonly recommended tool for this task, and Perl is the best
language for this kind of thing. Wget will get me the raw HTML, but
Mechanize lets me complete HTML forms, follow links by name, etc. Thanks to
you both, jmz


On 10/10/06, der.hans <> wrote:
>
> Am 10. Oct, 2006 schwätzte Joshua Zeidner so:
>
> > Does anyone here have any experience developing Web Scraping
> > applications? any suggestions for tools? I can manage Perl, PHP, Java,
> > Python, and C/C++. Recommendations are welcome.
>
> moin moin Josh,
>
> I used WWW::Mechanize for a project and was quite happy with it. It
> doesn't handle javascript, but none of the projects I found did. For the
> project I had to first login, then scrape multiple pages in order to get
> the data I wanted.
>
> I believe there's a Firefox module that allows you to automate replaying
> mouse/keyboard events. That's likely region-based rather than parse-based,
> but it might handle javascript.
>
> libwww-mechanize-perl - Automate interaction with websites
>
> ciao,
>
> der.hans
> --
> #  https://www.LuftHans.com/        http://www.CiscoLearning.org/
> #  Join the League of Professional System Administrators!
> https://LOPSA.org/
> #  "If I want my children to work hard, I better be the hardest working
> #  person they've ever met. If I want the children to be nice, I better
> #  be the kindest human being they've ever met." -- Rafe Esquith

>
> ---------------------------------------------------
> PLUG-discuss mailing list -
> To subscribe, unsubscribe, or to change you mail settings:
> http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss
>
>



--
.0000. communication.
.0001. development.
.0010. strategy.
.0100. appeal.

JOSHUA M. ZEIDNER
IT Consultant

++power; ++perspective; ++possibilities;
( 602 ) 490 8006

---------------------------------------------------
PLUG-discuss mailing list -
To subscribe, unsubscribe, or to change you mail settings:
http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss