Craig Brooksby <
azipa@brxb.com> wrote:
>I have already diligently searched google, and now I'm looking for
>recommendations.
>
>Does anyone know of an app for capturing text from screen? (I'm using
>Gnome on RH9). Example: Mozilla provides a very useful display of
>links under View-->Page Info-->Links. But there is no way provided by
>Mozilla to copy and paste, or just "grab" that information (AFAICT). I
>can sort of "select" the rows, but can't seem to copy and paste it.
>
>I want that list of links.
>
>
>
I tried messing with that part in Mozilla. That's quite infuriating,
that you can't grab information out of that window. Try Mozilla's DOM
tree parser (you might have to do extra installation).
Also, try wget. Standard UNIX tool. Lots of options. Do a 'man wget'.
This plus a combination of grep or perl can accomplish a lot.
The killer app for the real usage of the phrase "screen scraping" is
Perl::LWP. This is how you can compile and keep up to date your own
database of 'stuff', whatever that may be. Takes lots of time to learn,
and once you know it it takes a bit of time to make scripts. But if you
want to build your own data wharehouse that checks for 'stuff' every so
often and keeps your data wharehouse up to date, Perl::LWP is the bomb.
Actually, if you decide to try Perl::LWP, this may take you just a day,
with the following information to save you time: The way to just get
links is to get the web page (one-line command), use the DOM tree parser
(one more line), then search the DOM tree for all the 'a' tags and print
the 'href' attributes from each of them (two lines of code).
--Alexander