My crusade for web content filtering
Kevin Saling
networkpro@email.com
Wed, 2 Aug 2000 15:38:45 -0700
> You avoid the buffering problems because you dont buffer at all. You just
> examine the words as they are downloaded. and as long as they dont violate
> your filtering rules, you allow the datastream to continue to the user's
> browser. When the filter rules are triggered, you just send a string like
> "<br><br>This page has triggered the filter<p> The Connection has been
> terminated".
Ah, I see what you mean. Any thoughts on how one would code something like
that? I'm thinking along the lines of STDIN/STDOUT through a perl script or
sed or something. For example, I regularly pipe tcpdump output through sed
to make the output more readable on a character terminal attached to my
firewall. However, I'm really at a loss on how I could situate my script
between the requesting client and the responding webserver.
Check this out. I tried a little experiment with netcat:
# echo -e Success'\n'Cool > ./wordlist
# cat ./wordlist
Success
Cool
# echo -e GET / HTTP/1.1 '\n' | nc google.com 80 | grep -f ./wordlist
<p><font size="-1"><a href="link_NPD.html">Survey Says: Google Success
Continues
</a></font></p>
<p><br><center><p><font size="-1"><a href="jobs.html"><font
color="6f6f6f">Cool
Jobs</font></a> - <a href=http://directory.google.com><font
color="6f6f6f">Try o
ur Web Directory</font></a> - <a href="adv/intro.html"><font
color="6f6f6f"
>Advertise with Us</font></a><br>
As you can see, I grabbed the default webpage from google and filtered it
through my wordlist without buffering anything (at least I don't think it's
buffering). Seems like I could instead pipe to a script that uses sed
and/or awk to search for keywords and replace all text if a hit occurs.
However, I still don't know how to position it between server and client.
> If it were me doing such a filter, I would have all filters triggered in
> that manner put into a check list. Then someone could look over the list
> once a week or so and then apply the list to a list based part of the
> filter.
Yes, that is an add-on I had in the back of my mind. I haven't given it
much thought yet, but it sounds resonable on the surface, right?
...Kevin