Re: Web crawling

Top Page
Attachments:
Message as email
+ (text/plain)
Delete this message
Reply to this message
Author: Craig White
Date:  
To: plug-discuss
Subject: Re: Web crawling
On Mon, 2005-02-28 at 08:32 -0700, Nathan England wrote:
> I asked this question, because I can only get wget to use spider if I provide
> a file for it to follow. Otherwise it only reports back with the index.html
> then exits... I just can't get it to work.
>

---
yeah - in re-reading the man page for wget...
---
--spider
When invoked with this option, Wget will behave as a Web spider, which
means that it will not download the pages, just check that they are
there. For example, you can use Wget to check your bookmarks:

        wget --spider --force-html -i bookmarks.html


This feature needs much more work for Wget to get close to the
functionality of real web spiders.
---
probably need a different 'spider' - freshmeat.net returns 27 links to
the search for 'spider' and 19 for 'web spider'

Craig

---------------------------------------------------
PLUG-discuss mailing list -
To subscribe, unsubscribe, or to change you mail settings:
http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss