Catching cross-postings via procmail

Top Page
Attachments:
Message as email
+ (text/plain)
Delete this message
Reply to this message
Author: Dale Farnsworth
Date:  
Subject: Catching cross-postings via procmail
On Fri, Feb 20, 2004 at 08:12:12AM +0000, Deepak Saxena wrote:
> I am on various linux mailing lists and everyone once in a while
> a thread will get cross posted so I will get it twice (sometimes
> three times if I happen to be CC:ed) and it will all end up in whatever
> procmail rule catches it first. Anyone know of a way to catch
> cross-posted messages via procmail and have it only store the first
> copy of it? Maybe do a md5 hash of something that is cross posted
> and everyime that same set of To:, From: and Cc: headers is seen,
> discard the message if the hashes match and increment some counter.
> If the counter == number of lists I am on that would receive this
> message, delete it.
>
> This sounds incredibly complicated and requires keeping state across
> procmail instances so I'm thinking there must be a _MUCH_ easier
> method. Anyone?


>From the procmailex man page:
> If you are subscribed to several mailinglists and people cross-post to
> some of them, you usually receive several duplicate mails (one from
> every list). The following simple recipe eliminates duplicate mails.
> It tells formail to keep an 8KB cache file in which it will store the
> Message-IDs of the most recent mails you received. Since Message-IDs
> are guaranteed to be unique for every new mail, they are ideally suited
> to weed out duplicate mails. Simply put the following recipe at the
> top of your rcfile, and no duplicate mail will get past it.
>
>               :0 Wh: msgid.lock
>               | formail -D 8192 msgid.cache

>
> Beware if you have delivery problems in recipes below this one and
> procmail tries to requeue the mail, then on the next queue run, this
> mail will be considered a duplicate and will be thrown away. For those
> not quite so confident in their own scripting capabilities, you can use
> the following recipe instead. It puts duplicates in a separate folder
> instead of throwing them away. It is up to you to periodically empty
> the folder of course.
>
>               :0 Whc: msgid.lock
>               | formail -D 8192 msgid.cache

>
>               :0 a:
>               duplicates


I gateway my mailinglist messages to a local newsgroup server so I
haven't tried the above recipe. Should work though.

-Dale Farnsworth