Multipart tutorial: a centralized mail log parser

Thomas Gelf

Kategorien

Mittwoch, 9. September 2009

Multipart tutorial: a centralized mail log parser

Whoever is running an installation with single or multiple instances of Postfix and Amavis is earlier or later wondering how he could produce nice statistics, easy log access and so forth. So did I, and I'm pretty excited about what I have been able to realize.

Who attended the 4. Mailserver-Konferenz 2009 in Berlin and listened to my speech probably knows what I'm speaking about. The others could get an idea having a look at the slides from my presentation called The Big Picture: Der OSS-Mailcluster von Raiffeisen Online.

To give you a quick idea: we are running a small mail filtering cluster handling peaks of slightly more than 6 million delivery attempts a day. Based on this setup I was able to:

create an astonishing fast central realtime log search for our customer support team
provide the same tool with a prettier presentation and strong security measures and filters to our VIP customers for realtime mail log access in their web backend
and finally, just to prove how amazing it is: realtime display of all rejected, quarantined and delivered mail in our customers webmail backend - for each of our 30-40,000 mailboxes

While I do not consider those numbers very impressive I'm pretty sure that such a system could also scale well, even for hundreds of millions of delivery attempts a day.

One of the most important components of this system is it's central log parser. My strategy was as follows:

each Postfix and Amavis instance writes it's syslog messages to a central syslog server
this central syslog server pipes all log lines (Postfix and Amavis instances mixed) to a pipe read by a log parsing daemon
traditional log files are still being directly written to disk and securely stored as required by your government (especially here in Italy those laws are subject to regular changes)
aggregation happens in "real time", the daemon is building one object for each mail in memory, adding additional information it has learned line by line, and storing it to database once he is sure that he got all related log lines for this single message
please note that there could be hundreds of thousands of lines between first and last line related to a mail, events that occured later could appear earlier as there are multiple cores on multiple hosts working with one single mail
the most complicated part of this challenge was writing a daemon with a "let's wait to see if the missing line will arrive"-logic
this daemon should be able to catch most errors and also have a bulletproof garbage collection

And, please don't laugh: my current implementation has been written in PHP. Nonetheless it is working quite good, and I have also been able to find a workaround for a nasty memory leak (should be fixed in PHP 5.3, didn't try yet). On our live system the daemon is watched by a monitoring tool, if it's memory footprint exceeds a certain limit it gets a kill signal, dumps current data to disk and restarts itself. Scary, but works.

As this daemon is far from being perfect I decided long time ago to rewrite it from scratch as an OSS project. My employer gave me the permission to do so, as everyone here agrees that doing so would be the best strategy to get out the most of this valuable component.

Choosing a programming language

It needs to be developed with a scripting/programming language supporting threads (garbage collection and other regular tasks), current candidates are Perl, Python and C#. My personal favorite is Python, as I stopped writing Perl long time ago (still using it from time to time), and being a C#-evangelist seems to be a little bit risky these days

I did some first tests with Threads in Python a while ago, those where my very first steps with Python. Even if I have been able to realize what I wanted to, it didn't fully convince me. I've always heard that Python is sooo fantastic, especially for OOP - my first impression was a different one. Ever tried to implement patterns as Singleton & Co with Python? Ever tried to use protected / private variables and functions in a way someone extending your class could not violate the rules? To say the truth, I was deluded. Really deluded. I whish I could have the everything-is-an-object-philosophy from Python coupled with how OOP with PHP 5.3 looks like (finally provides late static bindings!!). Shall I really give C# a chance?

Design principles

It's nearly 02:00 AM and too late to write down everything that is flowing around in my mind, that's why I decided to start this as a multipart tutorial. I'll start asking for feedback, plan the daemon and code it step by step. A similar attempt has been started on this blog for an RFC-conform mail autoresponder (in German language).

Some first thoughts, and then I'll go to bed:

main parser should be able to decide which of the available parsers to engage with the current line
right now I have written such parsers for Postfix, Amavisd-new and Perdition
I'd like to implement the observer pattern for data storage, doing so multiple data backends could react on the "mail information is completed" event
who wants to could for example write a dedicated backend feeding a custom reputation database
I'd like to provide at least to example data backends, one of them suitable for very large distributed setups and the other one for small setups on single hosts
web backends and other tools are not planned right now, all this daemon will be useful for is realtime mail log correlation
...

Folks, that's enough for today - I really need to get some sleep! Feedback is more than welcome, especially I'd like to ask you:

are you aware of other similar projects (I'd not like to re-invent the wheel)?
do you have suggestions regarding the hot what's-the-best-programming-language topic (no flame war please!)?
what would you like this parser to be able to do (please have a look at the slides mentioned at the beginning to get an idea of what it is able to do right now)?

All other related questions / concerns / suggestions are obviously more than welcome!

in Mailserver, Programmierung um 01:45 | Kommentare (2)

Trackbacks

Trackback für spezifische URI dieses Eintrags

Keine Trackbacks

Kommentare

Ansicht der Kommentare: (Linear | Verschachtelt)

I'd be interested in working with you on this, if you're still interested in the project.

Tom

#1 Tom Johnson am 04.03.2010 01:25 (Antwort)

i like this place! Keep up the good posting! buy elliptical machines

#2 ellipticalab (Homepage) am 08.01.2011 19:24 (Antwort)

Kommentar schreiben

Name
E-Mail
Homepage
Antwort zu
Kommentar: Umschließende Sterne heben ein Wort hervor (*wort*), per _wort_ kann ein Wort unterstrichen werden.
Standard-Text Smilies wie :-) und ;-) werden zu Bildern konvertiert.
Um einen Kommentar hinterlassen zu können, erhalten Sie nach dem Kommentieren eine E-Mail mit Aktivierungslink an ihre angegebene Adresse.

Um maschinelle und automatische Übertragung von Spamkommentaren zu verhindern, bitte die Zeichenfolge im dargestellten Bild in der Eingabemaske eintragen. Nur wenn die Zeichenfolge richtig eingegeben wurde, kann der Kommentar angenommen werden. Bitte beachten Sie, dass Ihr Browser Cookies unterstützen muss um dieses Verfahren anzuwenden.

Hier die Zeichenfolge der Spamschutz-Grafik eintragen:
: Daten merken? Bei Aktualisierung dieser Kommentare benachrichtigen

Kalender

	Februar '19
So Februar 24 2019
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28

Aktuelle Einträge

SIP Registrar: Contact matching decisions if NAT fails: Mittwoch, September 9 2009
STUN-Client mit PHP: Donnerstag, Dezember 17 2009
Aufgestöbert: Foto von der SLAC 2009: Sonntag, April 18 2010
Net::SIP - SIP stack written in Perl: Mittwoch, April 21 2010
Tutorial: Mail-Autoresponder mit PHP - Teil 4: Donnerstag, Oktober 1 2009
IPv6-Wochenende - Rückblick: Montag, Juni 21 2010
NAT-Erkennung dank STUN: Dienstag, Juni 30 2009
Dovecot 2.0 has been released!: Montag, August 16 2010
Feature freeze für OpenSIPS 1.6: Mittwoch, September 23 2009
Avoid SST usage with OpenSIPS: Samstag, September 12 2009

Template based on s9y Bulletproof

http://thomas.gelf.net/blog/