Main Advantages of rsyslog v7 vs. v5

A lot of work has been done since the days of rsyslog v5. In this  post, I will provide the top 5 advantages of the v7 engine over the previous version. Please note that I do not talk about v6, as it is very close to the v5 engine, just with the improved config language. V6 also provides some experimental support for structured logging (project lumberjack), which has been fully matured in v7 (I strongly recommend to use v7 if you are serious about structured logging).

So, here are the main points:

  • greatly improved configuration language – the new language is much more intuitive than the legacy format. It will also prevent some typical mistakes simply be not permitting these invalid constructs. Note that legacy format is still fully supported (and you can of course do the same mistakes as before if you use legacy format).
  • greatly improved execution engine – with nested if/then/else constructs as well as the capability to modify variables during processing.
  • full support for structured logging and project lumberjack / CEE – this includes everything from being able to create, interpret and handle JSON-based structured log messages, including the ability to normalize legacy text log messages.
  • more plugins – like support for MongoDB, HDFS, and ElasticSearch as well as for the kernel’s new structured logging system.
  • higher performance – many optimizations all over the code, like 5 to 10 times faster execution time for script-based filters, enhanced multithreaded TCP input plugin, DNS cache and many more.

Of course, there are many more improvements. This list contains just the most important ones. For full details, check the file ChangeLog.

rsyslog plugin for structured Linux kernel logs

Thanks to Milan Bartos, there now is imkmsg, which supports structured kernel logging. This module will become available as part of rsyslog 7.1.5 later this week. The module was created with the help of Red Hat as part of the lumberjack effort for a better and structured logging infrastructure.

Note that the rsyslog v7 engine supports numerous enhancements for simple processing of lumberjack strucutred logs. So imkmsg is a natural fit for that version. As Milan told me, there is still room for improvement inside the module and some nits may exist and be fixed within the next weeks. However, we would appreciate feedback from early adoptors.

setting variables in rsyslog v7

Starting with rsyslog 7.1.3, variables can be manipulated inside rsyslog.conf. Note that the native message properties (like $msg) cannot be modified, but the CEE/lumberjack-type properties (those $!name variables) can. As variable modification is primarily an aid for template generation and modification, we do not consider this a restriction, as native message properties can simply be copied to the CEE tree if this is needed.

Note that CEE/lumberjack properties, as implemented in rsyslog,  can be hierarchical and levels are delimited by the bang sign (based on lumberjack recommendations). So “!uid” is the uid field in the CEE root, whereas “!usr!uid” is the uid field inside the usr container. Nesting can be as deep as desired. Currently, all parts of the CEE tree can be accessed. In later versions, this may require the setting of a global option. If that will happen depends on the feedback we receive.

So: how does it work?

To set a variable, simply use
set varname = expression;
Where expression can be an arbitrary complex expression, just like in an “if” statement. Note the semicolon at the end: this is unfortunately necessary and a diversion from the other config statements. However, this is the price we need to pay to remain upward compatible with the old style config format.

Concrete examples:

set $!usr!level2!var1 = “test”; 
set $!usr!level2!var1 = $msg; # update variable with native MSG field
set $!usr!level2!var2 = 1+1; # set !usr!level2!var2 = 2
set $!usr!level2 = $fromhost; # invalid

The last example is invalid and will not be executed because it tries to replace a complete container (!usr!level2) with a single value. This is rejected and not executed. Note that this problem is not detected during config read time but rather during execution time (in less trivial samples, it can not be reliable detected without execution).

Note that string concatenation is currently NOT supported, but it will be soon in the next releases. Also, full JSON containers cannot yet be copied. If this is tried to, the resulting variable will receive a string representation, which most probably is not what you wanted (and you will get a different result in future versions).

There is also an accompanying “unset” statement to remove a variable that is no longer needed. This is primarily meant to restructure a CEE container. It’s syntax simply is:

unset varname;

Again, note the semicolon at the end. A concrete example is

unset !usr!level2!var1;

which removes a single element. But full containers can also be removed:

unset !usr!level2;

Note that all variables are assigned to the message currently being processed. There currently is no way to set global variables (but this may become available based on feedback I see).

Which command line tools would you like to see for logging?

As part of project lumberjack, I am considering which tools would be vital to have in your logging infrastructure. Assume that you have a database that contains security relevant events, like syslog data or events from the Windows Event Log. Which tools would you like to have to access (and modify?) the database? Note that I am asking for small things, not something like “I’d like to have a full-blown SIEM” (of course you do). I am more focussed on tools like auditctl, tools that can be made the building blocks of your security system.

Current state of ommongodb default schema

Rsyslog’s ommongodb provides a default schema which is used for syslog data if no other is specified. It tries to align with the lumberjack project, so the schema may change in the next weeks, as hopefully a standard field set is defined there. I originally started with a very small set of fields (based on early cee/lumberjack schema), but it turned out to be too small to be really useful for real-world applications. So I have added a couple of fields today. The currently supported fields are:

  • sys – name of the system the message originated from (STRING)
  • time – timestamp from the syslog message (UTC_DATETIME)
  • time_rcvd – timestamp when the rsyslog instance received the message (UTC_DATETIME)
  • msg – the free-form message text (STRING)
  • syslog_fac – the syslog facility in numerical form, see RFC5424 to decode (INT32)
  • syslog_sever – the syslog severity in numerical form, see RFC5424 to decode (INT32)
  • syslog_tag – the traditional syslog tag (STRING)
  • procid – the name of the process that emitted the message (STRING)
  • pid  – the process id of the the process that emitted the message (STRING)
  • level – a severity level based on the lumberjack schema definition (STRING)

Please also see my previous blog post on cee/lumberjack schema mapping, which most importantly describes the current level mapping. 

Note that the default schema currently does NOT contain data obtained by parsing cee-enhanced syslog JSON part of the message. Current thinking is that we probably best include this as a sub-elements, maybe together with other structured data like RFC5424 structured data. This is currently being worked on. It’s less missing time to implement but the desire to avoid re-doing things as the spec changes. Anyhow, I’ll probably have a “timeout” after which I will implement at least some idea (after all, not too much work will be lost if things change).

If you use this schema, please keep in mind that it is experimental. At this stage I will not try to remain backwards compatible when I do changes. So excpect that newer versions may break your things!

rsyslog / CEE base schema mapping

I am giving a first shot at a mapping of the CEE base schema (as currently described in project lumberjack, NOT on the CEE site!) to rsyslog properties. The core idea is to use this mapping as the default for ommongodb. Then, rsyslog shall be able to write this schema, while logtools (and others) can rely on it. For obvious reasons “rely” is not to be treated literally, as the whole thing currently is a moving target.

So I would deeply appreciate feedback for improving this mapping.

In the following mapping, the cee field name is first, the rsyslog property second.

Fields we can always map:

  • srchost -> hostname
  • time -> timestamp (rsyslog currently populates subseconds, what seems not to be supported in lumberjack)
  • msg -> msg (initially used rawmsg, but decided against this)
  • pid -> procid (may not actually be a Linux process ID)
  • proc -> app-name
  • level -> generated based on syslog severity (value mapping see below)
Translation of syslog severity to lumberjack level (not bijective, syslog first, number in parenthesis denotes numerical value):
  • emergency(0) -> FATAL
  • alert(1), critical(2), error(3) -> ERROR
  • warning(4) -> WARN
  • notice(5), informational(6) -> INFO
  • debug(7) -> DEBUG
  • (never mapped) -> TRACE
Fields we do not currently provided, but could be in some cases:
Note that these fields may or may not be present inside a JSON/BSON document.
  • ppid -> parent process ID (SCM_CREDENTIALS, local only?)
  • uid ->  (SCM_CREDENTIALS, local only?)
  • gid ->  (SCM_CREDENTIALS, local only?)
  • tid -> thread ID (questionable, can probably not provided with current logging API)
As I said, feedback and suggestions are highly welcome. This list ist work in progress and can change in any instant. I’ll provide notification when the interface has stabilized. Do not expect this soon.

next steps for ommongodb

I just wanted to give you a heads-up on my work on ommongodb. During the past couple of days I have converted it to libmongo-client, which gives us a much more solid basis. I have also refactored it to some degree and adopted it to the new v6 config interface. Also, ommongodb will not be supported on pre-v6 platforms. This enables me to use the v6-exclusive features I am building now, especially great JSON and CEE support. Right now, ommongodb uses a very limited field set, and this set is hardcoded (so you can change it, but that means you need to change code).

My next step is to make ommongodb support the base event (as currently being discussed in project lumberjack). I will also provide a capability to add “extra” information from the cee field set. That’s probably not a perfect solution, but the goal is to get ready for some command line tools that are able to extract data from mongodb and thus make the system mimic it is a traditional flat-file syslog format. I have also asked Andre, the lead behind Adiscon LogAnalyzer to consider adding support for MongoDB to loganayzer. I have not yet heard back from him and don’t know exactly about his schedule, but I hope we will be able to make this happen very soon.

Only after that – somewhat hardcoded – work is done I’ll go back and look at JSON and templates in a more native way (very probably also looking at the contributed JSON string generator in more depth).

JSON and rsyslog templates

Rsyslog already supports JSON parsing and formatting (for all cee properties). However, the way formatting currently is done is unsatisfactory to me. Right now, we just take the cee properties as they are and format them into JSON format. In this mode, we do not have any way to specify which fields to use and we also do not have a way to modify the field contents (e.g. pick substrings or do case conversions). Exactly these are the use cases rsyslog invented templates for.

One way to handle the situation is to have the user write the JSON code inside the template and just inject the data field where desired. This almost works (and I know Brian Knox tries to explore that route). IT just works “almost” as there is currently no property replacer option to ensure proper JSON escaping. Adding this option is not hard. However, I don’t feel this approach is the right route to take: making the admin craft the JSON string is error-prone and very user-unfriendly.

So I wonder what would be a good way to specify fields that shall go into a JSON format. As a limiting factor, the method should be possible within the limits of the current template system – otherwise it will probably take too long to implement it. The same question also arises for outputs like MongoDB: how best to specify the fields (and structure!) to be passed to the output module?

Of course, both questions are closely related. One approach would be to solve the JSON encoding and say that to outputs like MongoDB JSON is passed. Unfortunately, this has strong performance implications. In a nutshell, it would mean formatting the data to JSON, and then re-parsing it inside the plugin. This process could be be somewhat simplified by passing the data structure (the underlaying tree) itself rather than the JSON encoding. However, this would still mean, that a data structure specific for this use would need to be created. That obviously involves a lot of data-copying. So it would probably be useful to have a capability to specify fields (and replacement options) that are just passed down to the module for its use (that would probably limit the required amount of data copying, at least in common cases). Question again: what would be a decent syntax to specify this?

Suggestions are highly welcome. I need to find at least an interim solution urgently, as this is an important building block for the MongoDB driver and all work that will depend on it. So please provide feedback (note that I may try out a couple of things to finally settle on one – so any idea is highly welcome ;)).

finally… rsyslog agent for windows released

It’s done! We have finally released the rsyslog Agent for Windows, a nice piece of software that enables easy integration of Windows Event Logs into a rsyslog backend system. Ideas for this tool floated around for roughly four to five month, and we had lots of internal discussions. It is important to note that we at Adiscon already have the necessary technology as part of our Windows products (actually, we invented this whole event-log-to-syslog type of software…), so it was just a matter of fine-tuning the code and selecting some useful default settings and policies.

The release is important because it makes clear that there actually is a Windows component (while I tried to convey that several times, people most often did not realize it due to name differences – something with “rsyslog” inside the name was expected). It is also important for me at Adiscon internally: the rsyslog Agent is a commercial product and license sales will make clear that this business is driven by rsyslog. And, obviously, I hope that this will help fund the project without need to resort to other things like premium plugins. This is also why the Agent is so important for the rsyslog project as whole: it will hopefully help to stabilize the funding situation even more.

EDIT: I should probably mention that the Windows Agent was more or less ready when I held its release in order to integrate support for cee-enhanced syslog into it. I am glad I could convince my folks at Adiscon, so that we now have this exciting feature actually available.

CEE-enhanced syslog defined

CEE-enhanced syslog is an upcoming standard for expressing structured data inside syslog messages. It is a cross-platform effort that aims at making log analysis (and log processing in general) much more easy both for log producers and consumers. The idea was originally born as part of MITRE’s CEE effort. It has been adopted by a larger set of logging stakeholders in an initiative that was named “project lumberjack“. Under this project, cee-enhanced syslog, and a framework to make full use of it, is being openly advanced. It is hoped (and planned) that the outcome will flow back to the CEE standard.

In a nutshell cee-enhanced syslog is very simple and powerful: inside the syslog message, a special cookie (“@cee:”) is followed by a JSON representation of the data. The cookie tells processors that the format is actually cee-enhanced. If you are interested in a more technical coverage, have a look at my cee-enhanced syslog howto presentation.

Adiscon is one of the main supporters of project lumberjack and CEE enhanced syslog. Since February 2012, Adiscon products offer basic support for cee-enhanced syslog, being among the first tools to do so.