rsyslog TCP stream compression

I have begun to work on a way to “stream-compress” syslog messages over plain TCP syslog protocol, with the intent to support it over standard syslog as well if the idea works out.

Traditionally, rsyslog does message-level compression. That is each single message is compressed and if there is sufficient compression gain, the message is transmitted in compressed form. This works perfectly with UDP and TCP syslog, but the compression ratio is limited. The problem is that a single message does not offer much repetition to be shrinked. This mode still works surprisingly well.

However, we are now doing one step further: for TCP, we have a session, and so we are able to not only compress single messages but rather the full stream of them. That offers considerably larger compression potential. In its extreme end, it can be compared to gzip’ing a log file. Those of you who already did this note that we usually have very high compression ratios 5-to-1 or even 10-to-1 are not uncommon.

To gain these ratios, we need to run the compressor in a mode where it outputs data only when it decides it is ready to do so. This means that upon transaction completion, we may still have some data unsent (possibly even all data!). At the expense of compression ratio, this can be “solved” but forcing the compressor to flush at transaction end. This will degrade compression.

I have now done a first PoC to check the validity of the idea. It is implemented in omfwd and imptcp (NOT imtcp) only. Flushing at transaction end is currently not supported. We are right now practice testing this, and I hope to have some results when I am back from my trip to Tallinn.