rsyslog templates & json

I today added a simpler method to specify JSON inside rsyslog templates. The new method simplifies specifying JSON-encoded fields. It still looks a bit ugly, but if you look closely enough, you’ll quickly notice that it no longer needs “quoting magic” and thus is far easier to work with.

Previously, you had to define a template like this:

$template tpl, “{“message”:”%msg:::json%”,”fromhost”:”%HOSTNAME:::json%”,”facility”:”%syslogfacility-text%”,”priority”:”%syslogpriority-text%”,”timereported”:”%timereported:::date-rfc3339%”,”timegenerated”:”%timegenerated:::date-rfc3339%”}”

The template given above is the default for ElasticSearch. With the new code, this can be replaced by:

$template tpl,”{%msg:::jsonf:message%,%HOSTNAME:::jsonf:fromhost%,%syslogfacility-text:::jsonf:facility%,%syslogpriority-text:::jsonf:priority%,%timereported:::date-rfc3339,jsonf%,%timegenerated:::date-rfc3339,jsonf%}”

It’s a bit shorter, but most importantly the JSON field is now generated by the property itself. This is triggered by the “jsonf” (json field) option. If it is given, a

“fieldname”=”value”

is automatically generated. The fieldname is either the forth parameter (see “message” in the msg field in the example above) or, if not given, defaults to the property name. If I hadn’t insisted on specific field names, I could have written the sample as such:

$template tpl,”{%msg:::json%,%HOSTNAME:::jsonf%,%syslogfacility-text:::jsonf%,%syslogpriority-text:::jsonf%,%timereported:::date-rfc3339,jsonf%,%timegenerated:::date-rfc3339,jsonf%}”

Note that the commas to separate the various JSON fields must still be included inside the template as literal text.

This new system works pretty well within the current template system. The config format will probably become a bit more intuitive when moving it over to the new config language. But the exact how’s of that are still to be decided. In fact, I think this string-based way of specifying templates is not so bad. In any case, I am also considering a template option which would take a template of pure field lists and generate proper JSON out of that. But that’s work for another day (or so ;)).

This is to be released with 6.3.9 and already available via the git master branch.

rsyslog & elasticsearch: async replication and timeout

Today I have added the capability to specify asynchronous replication support and timeout settings to omelasticsearch. Code-wise it’s a small change – it just passed the required parameters to ElasticSearch via the proper REST URL parameters. By default, both parameters are not set, that means default timeout and synchronous replication.

To set parameters, use

*.*     action(type=”omelasticsearch”
           … other params …
           asyncrepl=”on” timeout=”1m”)

If you leave “asyncrepl” out or set it to “off”, synchronous replication is used. For greatest flexiblity, the value of the “timeout” parameter is not checked. While this enables you to use anything ElasticSearch supports, invalid values can not be detected by omelasticsearch and thus will cause all inserts to fail (or misbehave). Note that some basic integrity checking is done, but we do not go great length here. So use with care.

rsyslog & ElasticSearch: using dynamic index and type

Today, I have added support for dynamically taking search index and type from the message itself. It works much like dynafiles do. If dynamic mode is selected (via config params), the index name and type given in the configuration are template names instead of  literal ones. That means the rsyslog core will generate the actual name at runtime, based on actual message content. These dynamically generated values will then be used inside the request.

Here is a quick howto:  you need to define the templates first, like this:

$template srchidx,”%hostname%”
$template srchtype,”%programname%”

In this example, the hostname (from the message) is used as name for the search index, and the programname (usually part of the syslog tag) is used as searchtype. The full power of templates, including all properties, can be used, so this is highly flexible.

To actually use these dynamic values, omelasticsearch must be called like this (line break do not matter):

*.*     action(type=”omelasticsearch”
           searchIndex=”srchidx” dynSearchIndex=”on”
           searchType=”srchtype” dynSearchType=”on”)

The “dynSearch…” setting tells the engine if it is a dynamic name or not. If set to on, like above, a template name is given and the actual name will be dynamically generated. If set to “off”, the literal value will be used (“off” is the default, so the parameters do not need to be specified if “off” is desired). For example:

*.*     action(type=”omelasticsearch”
           searchIndex=”srchidx”   searchType=”srchtype”)

will use the index “srchidx” and type “srchtype” literally. No template-based name extension will happen in this case. Please note that it is ok for only one parameter to by dynamic. It’s not “all or nothing”.

I hope this is a useful addition for omelasticsearch. Several folks have indicated that this method can increase throughput in a couple of use cases, for example to split of indexes based on host names, host functions and the like.

new rsyslog v5-beta released

I just released rsyslog 5.9.6, now a beta and no longer a development version. I have done a big merge today of bug fixes and small enhancements from the v5 branch and combined that all into 5.9.6. The plan is to mature this version as quickly as possible, which could really mean “rather quick”. For one, 5.9.5 is released for three month now and seem to have not broken quite a bit. Secondly, there are only very few new features and those that have been added already got their practice drill in some large environments (and such seem to be save for wider consumption).


I also sincerely hope that this will conclude the v5-devel branch for the foreseeable future. Having two active development branches always is a big burden, and merges are both time-consuming and error-prone. So if at all possible, I’ll try to refrain from doing any larger-scale development work in v5. It needs to be seen if I can hold to that when it comes to paid custom work. On the other hand, the v6 engine has also gotten both rather stable AND much feature enhanced. So it probably is a good idea to being to ask people to move up to v6. But that’s another story, to be written in the not so distant future ;)

rsyslog group on Facebook

I just created a rsyslog group on facebook. I hope it is useful. It’s a very interesting experiment for me. Are the tech folks available there as well? Will it work better than e.g. the rsyslog forum itself? I guess time will (quickly) tell. At best, this could become a nice new channel to get news out and motivate people to try out new versions.

Using different templates with rsyslog’s ElasticSearch plugin

Recently, an experimental ElasticSearch plugin has been added to rsyslog, omelasticsearch. Like all other output plugins, it comes with a canned template, which specifies a default “schema”. However, the template engine provides capabilities to use a completely different set of fields. In this blog post, I’ll briefly describe how this is done.

Note: all work is based on the current (April, 10th 2012) implementation of both omelasticsearch as well as the rsyslog core. Future implementation changes are expected in an effort to make things more intuitive. This may even break what I describe here. So if you come to this blog post at a later time, it probably is best to check if things have changed by “now” (especially if the procedure does not work ;)).

The current implementation ties into the template system. The default template looks as follows (all on one line, broken for readability):

$template JSONDefault, “{“message”:”%msg:::json%”,”fromhost”:”%HOSTNAME:::json%”,”facility”:”%syslogfacility-text%”,”priority”:”%syslogpriority-text%”,”timereported”:”%timereported:::date-rfc3339%”,”timegenerated”:”%timegenerated:::date-rfc3339%”}”

The ‘”‘ sequence is needed to represent a quote character inside the template. To format JSON, this is pretty ugly, but that’s the way the template processor currently works (and that is one reason why it is under review). As you can see, the JSON is actually “hand-crafted”, with the “json” option specifying that property text needs to be properly escaped to be well-formed JSON. If that option is specified, the template processor does the necessary escaping. Note that not all properties have the “json” option. This is purely for performance reasons. For example, time stamps do never include characters that need to be escaped. Consequently, the “json” option is not used there (but could be, e.g. “date-rfc3339,json”).

So now let’s define a different set of fields to be used. Let’s say we just want to have the date from the syslog message and the MSG part of the message itself. That would be as follows:

$template miniSchema, “{“message”:”%msg:::json%”,”timereported”:”%timereported:::date-rfc3339,json%”}”

Note: I requested JSON formatting in this example just to prove the point – don’t use it in a real deployment, as it is nonsense that costs CPU cycles ;)

Now I need to use the template within the omelasticsearch action. This is done as follows:

*.*     action(type=”omelasticsearch” template=”miniSchema”)

Note: the “all” (“*.*”) filter of course can be replaced with a different type of filter.

That’s all that needs to be done. Please note that you can add several templates and use these in several different elasticsearch output actions. Just exactly the same thing used in other actions. Also keep in mind that the property replacer permits to access a wide range of message properties. Most importantly, normalized properties or cee-enhanced syslog properties can be accessed via the CEE family of property names (essentially “$!cee-name” style). Just be sure that you include the “json” option into any property that may contain unsafe characters (which means almost all of the fields). This is not done automatically by the current engine and invalid characters can lead to strange problems, even aborts of ElasticSearch itself!

In the future, a more intuitive syntax is planned for JSON template definitions. Nevertheless, the current code permits full customization but requires taking care of the details.

Which command line tools would you like to see for logging?

As part of project lumberjack, I am considering which tools would be vital to have in your logging infrastructure. Assume that you have a database that contains security relevant events, like syslog data or events from the Windows Event Log. Which tools would you like to have to access (and modify?) the database? Note that I am asking for small things, not something like “I’d like to have a full-blown SIEM” (of course you do). I am more focussed on tools like auditctl, tools that can be made the building blocks of your security system.

Current state of ommongodb default schema

Rsyslog’s ommongodb provides a default schema which is used for syslog data if no other is specified. It tries to align with the lumberjack project, so the schema may change in the next weeks, as hopefully a standard field set is defined there. I originally started with a very small set of fields (based on early cee/lumberjack schema), but it turned out to be too small to be really useful for real-world applications. So I have added a couple of fields today. The currently supported fields are:

  • sys – name of the system the message originated from (STRING)
  • time – timestamp from the syslog message (UTC_DATETIME)
  • time_rcvd – timestamp when the rsyslog instance received the message (UTC_DATETIME)
  • msg – the free-form message text (STRING)
  • syslog_fac – the syslog facility in numerical form, see RFC5424 to decode (INT32)
  • syslog_sever – the syslog severity in numerical form, see RFC5424 to decode (INT32)
  • syslog_tag – the traditional syslog tag (STRING)
  • procid – the name of the process that emitted the message (STRING)
  • pid  – the process id of the the process that emitted the message (STRING)
  • level – a severity level based on the lumberjack schema definition (STRING)

Please also see my previous blog post on cee/lumberjack schema mapping, which most importantly describes the current level mapping. 

Note that the default schema currently does NOT contain data obtained by parsing cee-enhanced syslog JSON part of the message. Current thinking is that we probably best include this as a sub-elements, maybe together with other structured data like RFC5424 structured data. This is currently being worked on. It’s less missing time to implement but the desire to avoid re-doing things as the spec changes. Anyhow, I’ll probably have a “timeout” after which I will implement at least some idea (after all, not too much work will be lost if things change).

If you use this schema, please keep in mind that it is experimental. At this stage I will not try to remain backwards compatible when I do changes. So excpect that newer versions may break your things!

Adiscon LogAnalyzer now supports MongoDB

I just wanted to share the good news that Andre, LogAnalyzer‘s development lead, today finished implementing a logstream driver for MongoDB. So this nice tool can now also be used to access MongoDB based data. Andre’s lab data was created by rsyslog’s ommongo output module (you currently need rsyslog git master branch to make this work). The logstream driver is not yet really optimized and we do not make full use of the NoSQL capabilites (like different schemas inside a single collection and all this). However, there is lots of exciting stuff on the todo list and I thought I mention a first successful step – and probably a quite important one if you “just want to use that thing” ;) So: good news for a Friday!

rsyslog / CEE base schema mapping

I am giving a first shot at a mapping of the CEE base schema (as currently described in project lumberjack, NOT on the CEE site!) to rsyslog properties. The core idea is to use this mapping as the default for ommongodb. Then, rsyslog shall be able to write this schema, while logtools (and others) can rely on it. For obvious reasons “rely” is not to be treated literally, as the whole thing currently is a moving target.

So I would deeply appreciate feedback for improving this mapping.

In the following mapping, the cee field name is first, the rsyslog property second.

Fields we can always map:

  • srchost -> hostname
  • time -> timestamp (rsyslog currently populates subseconds, what seems not to be supported in lumberjack)
  • msg -> msg (initially used rawmsg, but decided against this)
  • pid -> procid (may not actually be a Linux process ID)
  • proc -> app-name
  • level -> generated based on syslog severity (value mapping see below)
Translation of syslog severity to lumberjack level (not bijective, syslog first, number in parenthesis denotes numerical value):
  • emergency(0) -> FATAL
  • alert(1), critical(2), error(3) -> ERROR
  • warning(4) -> WARN
  • notice(5), informational(6) -> INFO
  • debug(7) -> DEBUG
  • (never mapped) -> TRACE
Fields we do not currently provided, but could be in some cases:
Note that these fields may or may not be present inside a JSON/BSON document.
  • ppid -> parent process ID (SCM_CREDENTIALS, local only?)
  • uid ->  (SCM_CREDENTIALS, local only?)
  • gid ->  (SCM_CREDENTIALS, local only?)
  • tid -> thread ID (questionable, can probably not provided with current logging API)
As I said, feedback and suggestions are highly welcome. This list ist work in progress and can change in any instant. I’ll provide notification when the interface has stabilized. Do not expect this soon.