CEE-enhanced syslog defined

CEE-enhanced syslog is an upcoming standard for expressing structured data inside syslog messages. It is a cross-platform effort that aims at making log analysis (and log processing in general) much more easy both for log producers and consumers. The idea was originally born as part of MITRE’s CEE effort. It has been adopted by a larger set of logging stakeholders in an initiative that was named “project lumberjack“. Under this project, cee-enhanced syslog, and a framework to make full use of it, is being openly advanced. It is hoped (and planned) that the outcome will flow back to the CEE standard.

In a nutshell cee-enhanced syslog is very simple and powerful: inside the syslog message, a special cookie (“@cee:”) is followed by a JSON representation of the data. The cookie tells processors that the format is actually cee-enhanced. If you are interested in a more technical coverage, have a look at my cee-enhanced syslog howto presentation.

Adiscon is one of the main supporters of project lumberjack and CEE enhanced syslog. Since February 2012, Adiscon products offer basic support for cee-enhanced syslog, being among the first tools to do so.

cee-enhanced event log to syslog forwarding

As many know, we at Adiscon also work hard at Windows Event Log to syslog forwarding software. During the past days we have taken the time to implement cee-enhanced syslog format inside these products as well. It is currently a proof of concept stage, but mostly because the relevant specs are also at PoC. This effort nicely integrated with the new project lumberjack, which aims at providing structured logging. New releases of the relevant Windows products (EventReporter and MonitorWare Agent) will be released very soon. With these releases, we are again the first-ever folks to release something never seen before, this time CEE support for windows logging ;)

But how does it work? Basically, it is a message format option of the “format syslog” option. If you select cee-enhanced syslog, messages will be emitted in that format. Most importantly, they will included nice name/value pairs of the Windows events (if Windows provided names, else the previous “Paramn” replacement names will be used). For example, a security event is described as follows:

@cee: {“source”: “machine.local”, “nteventlogtype”: “Security”, “sourceproc”: “Microsoft-Windows-Security-Auditing”, “id”: “4648”, “categoryid”: “12544”, “category”: “12544”, “keywordid”: “0x8020000000000000”, “user”: “N\A”, “SubjectUserSid”: “S-1-5-11-222222222-333333333-4444444444-5555”, “SubjectUserName”: “User”, “SubjectDomainName”: “DOMAIN”, “SubjectLogonId”: “0x5efdd”, “LogonGuid”: “{00000000-0000-0000-0000-000000000000}”, “TargetUserName”: “Administrator”, “TargetDomainName”: ” DOMAIN “, “TargetLogonGuid”: “{00000000-0000-0000-0000-000000000000}”, “TargetServerName”: “servername”, “TargetInfo”: ” servername “, “ProcessId”: “0x76c”, “ProcessName”: “C:\Windows\System32\spoolsv.exe”, “IpAddress”: “-“, “IpPort”: “-“, “catname”: “Logon”, “keyword”: “Audit Success”, “level”: “Information”}

Note that we currently focus on cee-enhanced syslog format. We did not yet try to map the Windows field names to the CEE dictionary/profile terms. Probably the most important reason for this focus is that we do not yet have any definite spec to write to. Obviously, once the spec is out, it is fairly easy to upgrade the implementation to support these other names.

A co-worker is right now doing some more testing with rsyslog, which is able to understand that new format. I’ll update you with the findings, and procedures, once they are ready.

syslog normalization

I am working on syslog normalization for quite some years now. A couple of days ago, David Lang talked to me about syslog-ng’s patterndb, an approach to classify log messages and extract properties from it.

I have looked at this approach, and it indeed is promising. One ingredient, though, is missing, that is a directory of standard properties (like bytes sent and received in traffic logs). I know this missing ingredient very well, because we also forgot it until recently.

The aim to normalize log data is far from being new. Actually, I think it is one of the main concerns in log analysis. Probably one of the first folks who thought seriously about it was Marcus Ranum, who coined the concept of “artificial ignorance”, meaning that we can remove those messages from a big pile of logs that we know to be uninteresting. But in order to do that correctly, you need to know how exactly they look. And this is where log normalization comes in. I have written an in-depth paper in 2004, title “On the nature of syslog data“. The version officially published claims “work in progress”, but it still has all the juicy details.

Internally, we implemented this approach in our MonitorWare products a little bit later. For example, it is used inside the “Post Process Action” in WinSyslog (Michael also wrote a nice article on how to parse log messages with this action). While this was a great addition (and is used with great success), I failed to get enough community momentum to build a larger database of log messages that could be used as a basis for large scale log normalization. One such – largely failed for syslog – approach is the event knowledge base.

However, I did not give up on the general idea and proposed it wherever appropriate. The last outcome of this approach is the soon-to-be-released Adiscon LogAnalyzer v3, which uses so-called message parsers to obtain useful information from log entries. Here, I hope we will be able to gain more community involvement. We already got two message parsers contributed. Granted, that’s not much, but the ability to have them is so far little known. With the release of v3, I hope we get more and more momentum.

The syslog-ng patterndb approach brings an interesting idea to this space: as far as I have heard (I generally do NOT look at competing code to prevent polluting my code with things that I should not use), they use radix trees to parse the log messages. That is a clever approach, as it provides a solution for much quicker parsing large amounts of parse templates. This makes the approach suitable for real-time normalization of an incoming stream of syslog data.

Adiscon LogAnalyzer, by contrast, uses a regex-based approach, but that primarily for simplicity in an effort to invite more contributions (WinSyslog has a far more sophisticated approach). In Adiscon LogAnalyzer we began to become serious with identifying what a property actually means. While we have a fixed set of properties, with fixed semantics, in both WinSyslog, MonitorWare Agent and rsyslog, this set is rather limited. The Windows product line supports ease of extension of the properties, but does not provide standard IDs for those properties.

In Adiscon LogAnalyzer, we have fixed IDs for a larger set of properties, now about 50 or so. Still, that set is very small. But we created it with the intention to be able to map various “semantic objects” from different log entries to a single identity. For example, most firewall logs will contain a source and destination IP address, but almost all firewalls will use different log message formats to do that. So we need to have different analyzers to support these native formats, for example in reports. In Adiscon LogAnalyzer, we can now have a message parser “normalize” these syslog entries and map the vendor-specific format to the generic “semantic object”. Thus the upper layers (like views and reports) then work on these normalized semantic objects and do not need to be adopted to each firewall. This needs only be done at the parser level.

Such a directory of semantics objects would be very useful in my humble opinion. We are currently working on making it publicly available, all this in the hope for a community to involve itself ;) If we manage to get a large enough number of log and/or parser contributions, we may potentially be able to make Adiscon LogAnalyzer an even better free tool for system administrators.

And as there is hope that this will finally succeed, I have begun to think about a potential implementation inside rsyslog. It doesn’t sound very hard, but still requires careful thinking. One thing I would like to see is a unified approach that covers at least rsyslog and Adiscon Loganalyzer, and hopefully the Windows tools as well.

Another very good thing is that there already is a standard for providing standard semantical objects: during the IETF syslog standardization effort, I pressed hard for so-called structured data elements. I managed to get them into the final RFC. These structured data elements are now the key for conveying the log information once it is normalized: the corresponding name-value pairs can easily be encoded with it.

I hope we will finally able to succeed on this road, because I think this would be of tremendous benefit for the syslog community.

Introducing rainerscript ;) – and some rsyslog futures

What the heck is this? “rainerscript”? An acute case of megalomania?

Wait a moment… What’s this all about? It all starts with expression support in rsyslog and the notorious question on what the future configuration file format for rsyslog should be. From the technical perspective, I think I am finally approaching a conclusion, which will probably look quite a bit like a script language. A script language that will be specifically tailored (and well suited) to processing (system) events.

Heck, another new term… Well, if you know Adiscon’s MonitorWare family, its not new to you. In MonitorWare (where I consider rsyslog still to be a part of), we always thought about generic system events, and not syslog messages, event logs, snmp traps and all the like. The same concept somewhat naturally will apply to rsyslog and it begins to manifest with multiple input plugins. The imfile plugin is probably the first extension that brings this spirit to life. While all other plugins so far process syslog messages, imfile processes plain text files. Of course, they convert easily to syslog messages. But from a high-level design perspective you can think of event messages – some being syslog, some being file lines, some other being… use your imagination (or visit the MonitorWare site to get an idea where else events can come from).

The bottom line is that rsyslog has evolved and is further evolving into something that of course can process syslog messages. But it’s not limited to that. It can process whatever system event occurs. It’s just a matter of the input plugins. Of course, the rest of the engine must also support event classes other than syslog. That’s, among others, what I am currently working on.

To be a really useful tool, its configuration must be advanced from what it currently is. Most specifically, we must be able to change event (read: message) properties on the fly, supply different parsing capabilities, maybe have some user-supplied logic during the event flow… and so on.

The most natural solution is to provide some (more or less powerful) scripting capabilities. I am convinced that’s the right route, especially now that I have done the initial work on rsyslog expressions. It’s clean, its very flexible, its extensible and — it’s easy to use. And in order to provide freedom of choice, I’ll most probably introduce a very similar concept into Adiscon’s commercial products. And finally, over time others may find it appealing to, so I can’t outrule there will be other implementations of the same idea.

This made me come to the conclusion that I needed a better name than “rsyslog.conf” for this concept. Initially, a name of “syslogscript” came up my mind. But as I said, it’s not just about syslog. The same applies to “logscript”, we are *not* just talking about logs (at least in the future). I begun to think and after some time arrived at “EventScript”. A perfect match – but also a third party trademark (even a registered one!). So forget about it. “rscript” – also a trademark. I ran out of good names and I also became impatient. I would also like to make sure that nobody again comes along to cause grief to me, just as Huwai did with it recent junk syslog patent. So I remembered a trick a German TV journal publisher used when his competitors filed law suit after law suite over name violations of his journal when he tried to start a new publication in a field that was obviously monopolized by a few insiders: he simply used his own name in the journal title. Of course, that doesn’t make a name totally immune against legal action, but it seams to be a quite strong weapon against all these patent and tradmark trolls out there.

And so I finally arrived at the idea that a good name for the event scripting language may be “rainerscript”. Some folks (not me! ;-) ) may even abbreviate it to rscript, fishing in trademark-mined territory. Don’t do that or at least do it at your own risk ;)

I hope you find the choice of name acceptable — and the overall direction we are taking a good one. As a side-note please let me assure that rsyslog’s easy and simple sysklogd-like configuration will not go away. Rainerscript will fully support that syntax, as well as all currently existing $-configuration lines. While the previous configuration format will become part of rainerscript, you actually do not need to know anything about the scripting language unless you intend to do some pretty advanced things. Rainerscript is not something available immediately. It will evolve over time. A first glimpse at it will be available in 3.12.0 in form of the rsyslog expression support. With it, you can do things like (all of the below is on one line, which is important):

if $message contains 'test' and $facility == local7 then -/var/log/local7test.log

Notice the new if and the old style action specifier (“-/var/log…”). The script code is just a natural extension to existing configuration format.