My crusade for web content filtering

Brian Cluff brian@snaptek.com
Wed, 2 Aug 2000 16:11:21 -0700


I would grab "FilterProxy" from:
http://draal.physics.wisc.edu/FilterProxy/
I don't know too much about this program, other than it sounds perfect for
what you want to do.

It comes with a bunch of modules that probably won' tinterest you too much,
but it does come with a skeleton module, that you could most likely drop
your filtering code that you have been playing with, directly into it, and
immediatly have your filter.

Brian Cluff

> > You avoid the buffering problems because you dont buffer at all.  You
just
> > examine the words as they are downloaded. and as long as they dont
violate
> > your filtering rules, you allow the datastream to continue to the user's
> > browser.  When the filter rules are triggered, you just send a string
like
> > "<br><br>This page has triggered the filter<p> The Connection has been
> > terminated".
>
> Ah, I see what you mean.  Any thoughts on how one would code something
like
> that?  I'm thinking along the lines of STDIN/STDOUT through a perl script
or
> sed or something.  For example, I regularly pipe tcpdump output through
sed
> to make the output more readable on a character terminal attached to my
> firewall.  However, I'm really at a loss on how I could situate my script
> between the requesting client and the responding webserver.
>
> Check this out.  I tried a little experiment with netcat:
>
> # echo -e Success'\n'Cool > ./wordlist
> # cat ./wordlist
> Success
> Cool
>
> # echo -e GET / HTTP/1.1 '\n' | nc google.com 80 | grep -f ./wordlist
>
> <p><font size="-1"><a href="link_NPD.html">Survey Says: Google Success
> Continues
> </a></font></p>
> <p><br><center><p><font size="-1"><a href="jobs.html"><font
> color="6f6f6f">Cool
> Jobs</font></a> - <a href=http://directory.google.com><font
> color="6f6f6f">Try o
> ur Web&nbsp;Directory</font></a> - <a href="adv/intro.html"><font
> color="6f6f6f"
> >Advertise with Us</font></a><br>
>
> As you can see, I grabbed the default webpage from google and filtered it
> through my wordlist without buffering anything (at least I don't think
it's
> buffering).  Seems like I could instead pipe to a script that uses sed
> and/or awk to search for keywords and replace all text if a hit occurs.
> However, I still don't know how to position it between server and client.
>
> > If it were me doing such a filter, I would have all filters triggered in
> > that manner put into a check list.  Then someone could look over the
list
> > once a week or so and then apply the list to a list based part of the
> > filter.
>
> Yes, that is an add-on I had in the back of my mind.  I haven't given it
> much thought yet, but it sounds resonable on the surface, right?
>
> ...Kevin
>
>
> _______________________________________________
> Plug-discuss mailing list  -  Plug-discuss@lists.PLUG.phoenix.az.us
> http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss