Solr looks interesting thank you.

Why Java
1) I know Java the best, and I can make something work very very easy, I just want something I can run continuesly with low foot print on a comodity VM.
2) The rest of the backend is in Java and I like to have a single development languae for any given segment of a project if posable for easy maintenance.
3) Mostly the firt reason ;)


As for sudo code of what I could do, I could do something like the following, I am just not happy with how it looks...

java.io.BufferedInputStream in = new java.io.BufferedInputStream(new java.net.URL(urlVector.getValue(i)).openStream())
while (!endOfPage)
{
String line = (in).readLine();
for ( i=0; keywordVector.getSize; i++)
{
if (line.indexOf(keywordVector.getValue(i)) > 0) 
{
somecount.set(i, somecount.getValue(i)++);
}
if (line.indexOf("</body>") > 0) 
{
endoOfPage = true;
}
       
}
}
in.close

<Please not I am not using an IDE just sudo code and no I did not bother putting the try catches or the complex evaluation logic, just showing basically how I could scrape a page>


On Sat, Aug 1, 2009 at 7:25 AM, Lisa Kachold <lisakachold@obnosis.com> wrote:
Why java?

Why not a simple javascript search script?

http://stackoverflow.com/questions/141280/whats-the-best-way-to-count-keywords-in-javascript

On 8/1/09, Bryan O'Neal <boneal@cornerstonehome.com> wrote:
> Thought of that, the overhead is worse then scraping, parsing, and
> searching.
>
> On Fri, Jul 31, 2009 at 7:51 AM, Lisa Kachold
> <lisakachold@obnosis.com>wrote:
>
>> Try using google?
>>
>> On 7/31/09, Bryan O'Neal <boneal@cornerstonehome.com> wrote:
>> > Ok, so I want to, with utmost efficacy, go through a web pages and ask
>> how
>> > many of a set of key words is in that web page. Does any one know of a
>> good
>> > open source tool for this?
>> > I have hundreds of web pages and a near equal number of key word sets so
>> > scraping each page, parsing to create a vector of strings and doing a a
>> set
>> > of nested for loop to run through each vector and compare to words in
>> > the
>> > key word vector is, well, FAR from efficient.
>> > I heard of Apache velocity, but that seems to be for creating pages on
>> the
>> > fly. I also heard of Apache lucene, but appears to be for implementing
>> your
>> > own query engine on your application server (to index and query your
>> pages)
>> >
>> > Also, if you know of a local ACTIVE java forum I would love to know
>> > about
>> > it. I have subscribed to a half dozen lists and there is nothing but
>> > silence.
>> >
>> > Thanks a bunch :)
>> >
>>
>>
>> --
>>
>> (623)239-3392
>> (503)754-4452 www.obnosis.com
>> ---------------------------------------------------
>> PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
>> To subscribe, unsubscribe, or to change your mail settings:
>> http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss
>>
>


--

(623)239-3392
(503)754-4452 www.obnosis.com
---------------------------------------------------
PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
To subscribe, unsubscribe, or to change your mail settings:
http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss