MongoDB, BSON and templates…

After I have improved the template system yesterday, I have begun to think again about the integration of custom templates (actually field lists) with ommongodb (rsyslog’s mongodb output plugin). A “problem” with the mongo interface is that it does not support native JSON but rather BSON, its binary equivalent. So what needs to be done is convert the textual JSON representation to BSON before it can be stored in MongoDB. Given the fact that the JSON representation must be build with the property replacer, this looks like a was of coding/enconding. Assuming that I would take JSON and tansfrom it to BSON (all this in ommongodb), the workflow would be as follows:

text properties -> encode JSON -> decode JSON -> generate BSON

The “encode JSON” step would happen inside the template processr, the “decode JSON” part in ommongodb. In essence, this looks like a quite flexible, but rather slow approach. After all, it would serialize to JSON just for interim needs. What I am actually looking for is this workflow:

text properties -> generate BSON

In that we would replace the JSON format with some internal format. That internal format in a kind already exists, in array passing mode. In this mode, the property text is passed in via an array. As a side-note, some transformations are necessary and desired even in internal format, as the property replacer permits to use not only the raw properties themselves but substrings, case conversions, regexes and the like. The problem with array passing mode is that it provides just the plain values. However, for BSON (and MongoDB) we also need to know the field name – and type information. The latter is probably easy, as rsyslog usually deals with text, only, and so we could stick to strings except maybe for dates. The field name is available since yesterday inside the template structure. However, there currently is no way for a plugin to access this information.

So it looks like the decent thing is to create a new interface that passes in a (description,value) pair to the plugin. The description most probably could be the template structure (or some abstraction if we feel bad about tying things too deeply together). That will prevent the detour via JSON, but still provide otherwise full capabilities. The bad thing, however, is that some complex interface gets yet another option (maybe it is time for a general cleanup?).

Feedback on this issue is appreciated.

Current state of ommongodb default schema

Rsyslog’s ommongodb provides a default schema which is used for syslog data if no other is specified. It tries to align with the lumberjack project, so the schema may change in the next weeks, as hopefully a standard field set is defined there. I originally started with a very small set of fields (based on early cee/lumberjack schema), but it turned out to be too small to be really useful for real-world applications. So I have added a couple of fields today. The currently supported fields are:

  • sys – name of the system the message originated from (STRING)
  • time – timestamp from the syslog message (UTC_DATETIME)
  • time_rcvd – timestamp when the rsyslog instance received the message (UTC_DATETIME)
  • msg – the free-form message text (STRING)
  • syslog_fac – the syslog facility in numerical form, see RFC5424 to decode (INT32)
  • syslog_sever – the syslog severity in numerical form, see RFC5424 to decode (INT32)
  • syslog_tag – the traditional syslog tag (STRING)
  • procid – the name of the process that emitted the message (STRING)
  • pid  – the process id of the the process that emitted the message (STRING)
  • level – a severity level based on the lumberjack schema definition (STRING)

Please also see my previous blog post on cee/lumberjack schema mapping, which most importantly describes the current level mapping. 

Note that the default schema currently does NOT contain data obtained by parsing cee-enhanced syslog JSON part of the message. Current thinking is that we probably best include this as a sub-elements, maybe together with other structured data like RFC5424 structured data. This is currently being worked on. It’s less missing time to implement but the desire to avoid re-doing things as the spec changes. Anyhow, I’ll probably have a “timeout” after which I will implement at least some idea (after all, not too much work will be lost if things change).

If you use this schema, please keep in mind that it is experimental. At this stage I will not try to remain backwards compatible when I do changes. So excpect that newer versions may break your things!

Adiscon LogAnalyzer now supports MongoDB

I just wanted to share the good news that Andre, LogAnalyzer‘s development lead, today finished implementing a logstream driver for MongoDB. So this nice tool can now also be used to access MongoDB based data. Andre’s lab data was created by rsyslog’s ommongo output module (you currently need rsyslog git master branch to make this work). The logstream driver is not yet really optimized and we do not make full use of the NoSQL capabilites (like different schemas inside a single collection and all this). However, there is lots of exciting stuff on the todo list and I thought I mention a first successful step – and probably a quite important one if you “just want to use that thing” ;) So: good news for a Friday!

next steps for ommongodb

I just wanted to give you a heads-up on my work on ommongodb. During the past couple of days I have converted it to libmongo-client, which gives us a much more solid basis. I have also refactored it to some degree and adopted it to the new v6 config interface. Also, ommongodb will not be supported on pre-v6 platforms. This enables me to use the v6-exclusive features I am building now, especially great JSON and CEE support. Right now, ommongodb uses a very limited field set, and this set is hardcoded (so you can change it, but that means you need to change code).

My next step is to make ommongodb support the base event (as currently being discussed in project lumberjack). I will also provide a capability to add “extra” information from the cee field set. That’s probably not a perfect solution, but the goal is to get ready for some command line tools that are able to extract data from mongodb and thus make the system mimic it is a traditional flat-file syslog format. I have also asked Andre, the lead behind Adiscon LogAnalyzer to consider adding support for MongoDB to loganayzer. I have not yet heard back from him and don’t know exactly about his schedule, but I hope we will be able to make this happen very soon.

Only after that – somewhat hardcoded – work is done I’ll go back and look at JSON and templates in a more native way (very probably also looking at the contributed JSON string generator in more depth).