My crusade for web content filtering

Top Page
Attachments:
Message as email
+ (text/plain)
Delete this message
Reply to this message
Author: Brian Cluff
Date:  
New-Topics: FilterProxy (was: My crusade for web content filtering)
Subject: My crusade for web content filtering
I would grab "FilterProxy" from:
http://draal.physics.wisc.edu/FilterProxy/
I don't know too much about this program, other than it sounds perfect for
what you want to do.

It comes with a bunch of modules that probably won' tinterest you too much,
but it does come with a skeleton module, that you could most likely drop
your filtering code that you have been playing with, directly into it, and
immediatly have your filter.

Brian Cluff

> > You avoid the buffering problems because you dont buffer at all. You

just
> > examine the words as they are downloaded. and as long as they dont

violate
> > your filtering rules, you allow the datastream to continue to the user's
> > browser. When the filter rules are triggered, you just send a string

like
> > "<br><br>This page has triggered the filter<p> The Connection has been
> > terminated".
>
> Ah, I see what you mean. Any thoughts on how one would code something

like
> that? I'm thinking along the lines of STDIN/STDOUT through a perl script

or
> sed or something. For example, I regularly pipe tcpdump output through

sed
> to make the output more readable on a character terminal attached to my
> firewall. However, I'm really at a loss on how I could situate my script
> between the requesting client and the responding webserver.
>
> Check this out. I tried a little experiment with netcat:
>
> # echo -e Success'\n'Cool > ./wordlist
> # cat ./wordlist
> Success
> Cool
>
> # echo -e GET / HTTP/1.1 '\n' | nc google.com 80 | grep -f ./wordlist
>
> <p><font size="-1"><a href="link_NPD.html">Survey Says: Google Success
> Continues
> </a></font></p>
> <p><br><center><p><font size="-1"><a href="jobs.html"><font
> color="6f6f6f">Cool
> Jobs</font></a> - <a href=http://directory.google.com><font
> color="6f6f6f">Try o
> ur Web&nbsp;Directory</font></a> - <a href="adv/intro.html"><font
> color="6f6f6f"
> >Advertise with Us</font></a><br>
>
> As you can see, I grabbed the default webpage from google and filtered it
> through my wordlist without buffering anything (at least I don't think

it's
> buffering). Seems like I could instead pipe to a script that uses sed
> and/or awk to search for keywords and replace all text if a hit occurs.
> However, I still don't know how to position it between server and client.
>
> > If it were me doing such a filter, I would have all filters triggered in
> > that manner put into a check list. Then someone could look over the

list
> > once a week or so and then apply the list to a list based part of the
> > filter.
>
> Yes, that is an add-on I had in the back of my mind. I haven't given it
> much thought yet, but it sounds resonable on the surface, right?
>
> ...Kevin
>
>
> _______________________________________________
> Plug-discuss mailing list -
> http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss