Batch similar cron mails
Currently, a single failure condition (typically Mercurial server unreachable) can easily generate several hundred cron mails. This is counterproductive, important issues can easily get lost. For any incident only the first report is interesting, the subsequent reports of the same issue only add noise.
What to change
One of our servers can take care of batching mails (this can be server4 or a new server), we can call it mailman.adblockplus.org. All cron mails should go to email@example.com (as a side-effect, this will take care of Google's overeager anti-spam policies). The processing of that email address should be done by a script with the following functionality:
- Remember all mail subjects seen in the past 5 minutes (minus the server identifier).
- If the mail subject hasn't been seen in the past 5 minutes: forward the mail to the admins immediately.
- If the mail subject has been seen in the past 5 minutes: batch the mail.
- Optional: reject mails where the claimed server name doesn't match the IP address (will take care of mails sent out by test VMs).
- Optional: log all mails in a way that allows looking them up/searching later.
In addition there should be a cron job that will look at the batches and send them to the admins as a single mail if either condition is met:
- Last mail in the batch was received at least 5 minutes ago.
- First mail in the batch was received at least 20 minutes ago.