My crusade for web content filtering

Kevin Saling
Wed, 2 Aug 2000 15:38:45 -0700

> You avoid the buffering problems because you dont buffer at all.  You just
> examine the words as they are downloaded. and as long as they dont violate
> your filtering rules, you allow the datastream to continue to the user's
> browser.  When the filter rules are triggered, you just send a string like
> "<br><br>This page has triggered the filter<p> The Connection has been
> terminated".

Ah, I see what you mean.  Any thoughts on how one would code something like
that?  I'm thinking along the lines of STDIN/STDOUT through a perl script or
sed or something.  For example, I regularly pipe tcpdump output through sed
to make the output more readable on a character terminal attached to my
firewall.  However, I'm really at a loss on how I could situate my script
between the requesting client and the responding webserver.

Check this out.  I tried a little experiment with netcat:

# echo -e Success'\n'Cool > ./wordlist
# cat ./wordlist

# echo -e GET / HTTP/1.1 '\n' | nc 80 | grep -f ./wordlist

<p><font size="-1"><a href="link_NPD.html">Survey Says: Google Success
<p><br><center><p><font size="-1"><a href="jobs.html"><font
Jobs</font></a> - <a href=><font
color="6f6f6f">Try o
ur Web&nbsp;Directory</font></a> - <a href="adv/intro.html"><font
>Advertise with Us</font></a><br>

As you can see, I grabbed the default webpage from google and filtered it
through my wordlist without buffering anything (at least I don't think it's
buffering).  Seems like I could instead pipe to a script that uses sed
and/or awk to search for keywords and replace all text if a hit occurs.
However, I still don't know how to position it between server and client.

> If it were me doing such a filter, I would have all filters triggered in
> that manner put into a check list.  Then someone could look over the list
> once a week or so and then apply the list to a list based part of the
> filter.

Yes, that is an add-on I had in the back of my mind.  I haven't given it
much thought yet, but it sounds resonable on the surface, right?
