Re: way to search the internet automatically say once a day …

Página superior
Adjuntos:
Obtener este mensaje como un correo
+ (text/plain)
Eliminar este mensaje
Responder a este mensaje
Autor: Craig Brooksby
Fecha:  
A: plug-discuss
Asunto: Re: way to search the internet automatically say once a day with no human intervention and store the results
> What I want is way to create a list that will be used to search the
> internet and then have the results stored not the actual web pages
> just the urls and then have a way for the urls to be reviewed.


Yep; using HTTrack (and probably wget) you can feed it list of URLs
that you want crawled; you can have it "throw away" the pages and just
store the links (the URLs) in a log. In other words, it harvests
links, not pages (in this case)

If I have a long list of pages (URLs) that I need to sequence through,
I use URLSlideShow from http://slideshow.rockhoward.com/. That's my
fast way to "review" thousands of sites. Just hit "next, next, next."
Beautiful.

That's the hands-on, do it yourself approach.

There are also products like http://www.aignes.com/ and online
services like http://www.changedetect.com/ that you can sign up for,
to detect changes on pages and send you an alert. These just do the
above *for* you, and spare you the details. In return, they want $$.

> Is a spider what I am looking for? I have looked at a lot of the
> spider projects and they seem to be for different uses.


Not sure what, specifically, you are stuck on. Spiders are not
mysterious. Feel free to contact me off list if you have more
questions... My advice is to try anything -- try it the hard way, or
the dumb way, but get moving -- then you will find you figure out the
rest. That's my brute-force approach to life.

(the other) Craig
---------------------------------------------------
PLUG-discuss mailing list -
To subscribe, unsubscribe, or to change you mail settings:
http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss