TLS for librelp

If you followed librelp’s git, you have probably already noticed that there is increased activity. This is due to the fact that TLS support is finally being added! Thanks to some unnamed sponsor, we could invest “a bit” of time to make this happen.

We have decided to base TLS support on GnuTLS, which has matured very much, is preferred by Debian and fully supported by Red Hat and has no licensing issues with GPL like openssl has (plus the sponsor also preferred it). We build TLS support directly into librelp, as we assume it will get very popular, so an abstraction layer would not make that much sense, especially given the fact the GnuTLS nowadays is almost already installed by default. And remember that an abstraction layer always adds code complexity and an (albeit limited) runtime overhead.

Librelp 1.1.0 will be the first version with basic TLS support. With “basic”, we mean that this is a full TLS implementation, but there are some useful additional features not yet present. Most importantly, this version will not support certifiates but rather work with anonymous Diffie-Hellmann key exchange. This means that while the integrity and privacy of the session can be guaranteed as far as the network is concerned, this version does not guard against man-in-the-middle attacks. The reason simply is that there is no way to mutually authenticate peers without certificates. We still think it makes a lot of sense to release that version, as it greatly improves the situation.

Obviously, we have plans to add certificate support in the very near future. And this also means we will add ways for mutual authentication, much like in rsyslog’s RFC 5425 implementation. It’s not finally decided if we will support all authentication options RFC 5425 offers (some may not be very relevant in practice), but that’s so far undecided. We currently strongly consider to start with fingerprint-based authentication, as this permits the ability to do mutual authentication without the need to setup a full-blown PKI. Also, most folks know fingerprint authentication: this is what ssh does when it connects to a remote machine.

So stay tuned to librelp development, many more exciting things are coming up. Please note that rsyslog 7.5.0 will be the first version to utilize the new librelp features – but that’s something for a different blog posting.

[This is also cross-posted to the librelp site]

rsyslog vs. systemd journal?

I gave an invited talk on this topic at LinuxTag 2013 in Berlin. I was originally asked to talk about “rsyslog vs. journal”, but requested that a question mark is added: “rsyslog vs. journal?”. This title much better reflects our current thinking in regard to the journal project.

Rather than eloborating on what’s our position, I thought it is easier if I just share the slide deck – and the full paper I have written on it. In a nutshell, both answer the question what we currently think of the journal, where we see which technology deployed and which cool things rsyslog can do to enhance enterprise logging. There is also a very intereting history lesson included. But enough of that, on to the real things:

The paper should definitely have all the details you ever want to know (well… ;)) and is a good read if you want to dig deeper:

Rsyslog vs Systemd Journal (Paper) from Rainer Gerhards

Note: the PDF can be downloaded directly from slideshare (use the “Save” button right on top of the paper).

LinuxTag 2013

I gave a talk on “rsyslog vs journal?” at LinuxTag 2013 in Berlin (slides an paper now available at “rsyslog vs. journal?” blog post). It was a great event, and I had quite some good discussions with rsyslog users. As it looks, the v7 config is very well received and many folks are moving toward that version.

Of course, I also learned (not surprisingly) that there is desire for better doc. In some discussions, the idea of small video tuturials came up, and I have to admit that I like this idea. It looks like it is quicker to do for me than writing full-blown tutorials and yet is probably very useful especially for folks who look for a very specific target. So I hope to find time to do some experimenting. I’ll probably start with some extracts from my talk, first doing the theoretic thing and then showing how things actually work – in 5 minute shots. So stay tuned.

In the mean time, here is a quick glimpse at the LinuxTag social event, which I also enjoyed very much (it’s actually rather short, because I wasn’t so much into just filming ;)).

Moving to github?

I am re-evaluating my development environment. One idea that pops up is if I should move the rsyslog project over to github. Initially, I was rather sceptic about using a third-party for the git repository (after all, a git server is not rocket science…), but github seems to have gotten momentum in the past years. But so far it is more or less my gut feeling that migrating over to it may make sense.

So I am looking for feedback from my users and fellow developers: what are the pros and cons on moving to github in your opinion? Please be subjective, that’s what I am looking for. So there is no need to be shy.

Please comment and let me know your thoughts!
Rainer

Log Anonymization with rsyslog

Starting with version 7.3.7, rsyslog natively support anonymizing log records. This is done with the help of a new module called mmanon. In short words, the module inspects the message and replaces IP-Addresses with an anonymized alternative representation.

This functionality is implemented via the (message modification module) action interface. Thus, the user has full control over when the anonymization happens. While it generally is desirable to anonymize as soon as possible, there can be situations where some data must be collected or processed un-anonymized. In that case, the anonymization can be deferred until after that processing.

The mmanon module basically has two operation modes: a faster but somewhat less secure and flexible mode (“simple”), in which specific octets are overwritten by user-configurable characters. For example, in this mode the IP address “10.1.65.123” could be replaced to “10.1.xx.xxx”. Note that the size of the specific octets is preserved.

In the other mode (“rewrite”), which is default, parts of the IP address are always zeroed out and the result is written in normalized form. The previous example would become “10.1.0.0” in this mode. This also means that the message size may shrink. In rewrite mode, the exact number of bits that shall be anonymized can be specified. The default is 16, but any other value may be selected. If, in the above example, 12 bits would have been selected, the output would have been “10.1.64.0”. This provides great flexibility to meet corporate and legal requirements. Note that this form still permits to use some coarse-grained analysis tools, like for example Geo-IP lookups (of course, depending on the number of removed bits).

Currently, mmanon does support IPv4, only. However, support for IPv6 is planned, we are just waiting for some feedback before going further. The new module is available immediately and can both be found in the source tarball as well as Adiscon-provided rsyslog RPMs and rsyslog Ubuntu Packages.

rsyslog TCP stream compression

I have begun to work on a way to “stream-compress” syslog messages over plain TCP syslog protocol, with the intent to support it over standard syslog as well if the idea works out.

Traditionally, rsyslog does message-level compression. That is each single message is compressed and if there is sufficient compression gain, the message is transmitted in compressed form. This works perfectly with UDP and TCP syslog, but the compression ratio is limited. The problem is that a single message does not offer much repetition to be shrinked. This mode still works surprisingly well.

However, we are now doing one step further: for TCP, we have a session, and so we are able to not only compress single messages but rather the full stream of them. That offers considerably larger compression potential. In its extreme end, it can be compared to gzip’ing a log file. Those of you who already did this note that we usually have very high compression ratios 5-to-1 or even 10-to-1 are not uncommon.

To gain these ratios, we need to run the compressor in a mode where it outputs data only when it decides it is ready to do so. This means that upon transaction completion, we may still have some data unsent (possibly even all data!). At the expense of compression ratio, this can be “solved” but forcing the compressor to flush at transaction end. This will degrade compression.

I have now done a first PoC to check the validity of the idea. It is implemented in omfwd and imptcp (NOT imtcp) only. Flushing at transaction end is currently not supported. We are right now practice testing this, and I hope to have some results when I am back from my trip to Tallinn.

rsyslog output plugin wrangling…

For some hours, I am fighting with parts of rsyslog design around 2006 (or so): initially, we thought that all actions will be terminated by canceling their thread WHEN they not shutdown within the queue shutdown timeout. Then, we saw that it was better to at least try it cooperatively (cancellation is still required if that does not work). Now, with imrelp, I have a situation where I need to pass some information down to librelp when it comes to termination. Supposedly a very simple thing to do (a single call). … unfortunately, the interface does not provide access to the action in question.

Hopefully, I finally found a work-around, via a “terminate immediately” pointer so far in use internally for the action engine. We’ll see…

Should I use rsyslog’s new or old config style?

I got a very interesting question on the rsyslog support forums, and I thought I share it, together with the answer, here at a more prominent spot:

After over a decade of using stock bsd syslog, I finally have a need to do some more complicated processing of logs (splitting off Postgres query logs from general Postgres logs), and after looking at other options (basically syslog-ng), I think rsyslog looks like a better fit. I’m mainly in it so I can use regex matching, but thinks like the log queueing and being able to easily move to db storage in the future look good.
Since I’m new, I’d considered that I might get a jump on things by sticking with the newest config syntax. But after doing some googling for examples and looking at the examples in the rsyslog wiki, it seems like maybe the newest syntax might be a bit too new for a beginner – I learn best by example.
Are there any serious downsides to NOT going with the most current syntax?

The answer is that the old syntax is still fully supported by the versions and will probably remain for quite some while (except for some very few exceptions, which we couldn’t carry over for good reasons – this is documented in the compatibility docs on the web site). Some parts of it are considered so important that they most probably never will go away. Actually, if you want to do simple things, the old syntax has its advantages. The more complex your processing gets, the more you benefit from the new syntax. But you can mix and match new and old style in almost all cases.

So my suggestion would be to get started using the old syntax and as soon as you begin to do more complex things, you can switch over to the new style. That’s actually the way it is designed ;) A good indicator of when it would be benefitial to move to new style is when you begin to use a lot of directives beginning with $, especially if they modify an action. Also, if you move to action queues, I would strongly suggest to use new style. It is far more intuitive an less error-prone.

To provide a bit more background information, there is an important non-technical reason why the classical syntax is remain for a long time: basic syslog.conf format is extremely well known, covered in a lot of text books, taught in numerous courses and used in a myriad of Internet tutorials. So if we would abandon it, we would thrash a lot of people’s knowledge and help resources. In short: we would make it much harder for folks that it would actually need to be. This has never been rsyslog philosophy. Providing the ability to changed gradually and with growing needs is a core goal.

multi-character field delimiters

On the rsyslog mailing list, the ability to use multiple characters as field delimiters had been requested recently. Today, I took some time off the my schedule and implemented that functionality. It is probably very useful for a number of cases. An important one is probably in combination with control character escaping, where rsyslog by default expands a single character into a four-byte escape “#ooo” with o being the octal character code (so  e.g. US ASCII HT [horizontal tab] becomes “#011”).

The new functionality is available for the RainerScript field() function. I do not intend to add it to template strings.

Some quick usage sample:

The following is the traditional way of single-byte delimiters, here with the comma character (US ASCII decimal code 44):

set $!usr!field2 = field($msg, 44, 2);
template (name=”fld” type=”string” string=”‘%$!usr!field2%’ — msg: %msg%n”)
action(type=”omfile” file=”/path/to/logfile” template=”fld”)

And this is the same with the string “#011” as delimiter:

set $!usr!field2 = field($msg, “#011”, 2);
template (name=”fld” type=”string” string=”‘%$!usr!field2%’ — msg: %msg%n”)
action(type=”omfile” file=”/path/to/logfile” template=”fld”)

Note that the field number (index) need not necessarily to be fixed. It can be derived from an appropriately formatted message. Here the first field contains the actual field to extract, delimiter is “#011” again:

set $!usr!idx = field($msg, “#011”, 1);
set $!usr!field = field($msg, “#011”, $!usr!idx);
template (name=”fld” type=”string” string=”‘%$!usr!field%’ — msg: %msg%n”)
action(type=”omfile” file=”/path/to/logfile” template=”fld”)

In that last sample the $msg of

“3#011val 1#011val 2#011val 32#val 4”

would return

“val 2”

Keep in mind that the first field is the field index, so the actual data fields start at 2 (field 1 is “3”, field 2 is “val 1”, field 3 “val 2” and so on).

This functionality is already present in git master head and will be released as part of 7.3.7 in the not so distant future. Some more details can be found inside the RainerScript documentation page.