Re: way to search the internet automatically say once a day …

Top Page
Attachments:
Message as email
+ (text/plain)
Delete this message
Reply to this message
Author: Craig Brooksby
Date:  
To: plug-discuss
Subject: Re: way to search the internet automatically say once a day with no human intervention and store the results
> What I want is way to create a list that will be used to search the
> internet and then have the results stored not the actual web pages
> just the urls and then have a way for the urls to be reviewed.


Yep; using HTTrack (and probably wget) you can feed it list of URLs
that you want crawled; you can have it "throw away" the pages and just
store the links (the URLs) in a log. In other words, it harvests
links, not pages (in this case)

If I have a long list of pages (URLs) that I need to sequence through,
I use URLSlideShow from http://slideshow.rockhoward.com/. That's my
fast way to "review" thousands of sites. Just hit "next, next, next."
Beautiful.

That's the hands-on, do it yourself approach.

There are also products like http://www.aignes.com/ and online
services like http://www.changedetect.com/ that you can sign up for,
to detect changes on pages and send you an alert. These just do the
above *for* you, and spare you the details. In return, they want $$.

> Is a spider what I am looking for? I have looked at a lot of the
> spider projects and they seem to be for different uses.


Not sure what, specifically, you are stuck on. Spiders are not
mysterious. Feel free to contact me off list if you have more
questions... My advice is to try anything -- try it the hard way, or
the dumb way, but get moving -- then you will find you figure out the
rest. That's my brute-force approach to life.

(the other) Craig
---------------------------------------------------
PLUG-discuss mailing list -
To subscribe, unsubscribe, or to change you mail settings:
http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss