journald and rsyslog

I was made aware of the proposed new Linux logging interface via journald by a couple of questions I received today. I have to admit that I was not aware of this effort. I follow the systemd development mailing list, but as far as I can see (and search the archives), journald was never mentioned there.
This is meant as a first comment on the relationship between the journald project and the rsyslog project. I have obviously not done any in-depth analysis of the proposed new logging system. I have just quickly skimmed through Lennart’s paper in which he introduces journald. As such, I do not intend to talk about the technical details of the journald and rsyslog, more on the bigger picture of how it affects rsyslog (and probably the syslog community at large).
In a nutshell, the systemd/journald logging system looks much like the Windows Event Log to me.  This is not necessarily bad news, because the Microsoft system is not bad, at least with the recent enhancements made. As some of you probably know, I have worked with the Windows Event Log quite a bit and even invented the first-ever (and still best ;)) eventlog to syslog tool. This, however, shows that a local event log alone is typically not sufficient. Such a system is excellent for a local desktop, but it needs a network component for centralized administration. Lennart wrote that journald will be a local component in the first iteration but this may change in the future. In Windows, the event log evolved into such a network-aware system and still Adiscon (my company) has many customers who need agents for integrating the proprietary log format into a standardized format — that being syslog. MonitorWareAgent and EventReporter are still heavily used for that purpose.
Coming back to journald and looking at Lennart’s reasoning: some of his arguments in regard to syslog are technically wrong, but can be considered  true if one looks at current practice: let me take up on the timestamp. Lennart claims that syslog does not contain a timezone and mentions that journald will provide much finer resolution. Actually, the timestamp is a source of deep frustration to me. Ages ago (2006?) I implemented high-precision timestamps (including TZ info) in rsyslog, and RFC5424 has brought them to the on-the-wire protocol. As far as I know, syslog-ng supports them for quite a while as well (but I am not a syslog-ng expert ;)). HOWEVER, all distributions turn high precision timestamps off and set the dumb old format as this is a requirement to keep old tools working. Initially Michael Biebl was brave enough to keep high-precision timestamps active in Debian‘s rsyslog package, but was forced by complaints to go back to imprecise ones (here is an example). Nobody seems to be really interested in updating the other tools (and lots of custom programs).
If I understood Lennart correctly, he will not only write a new log API and log store, but also new tools for log processing, a completely new log management subsystem. This may not be a bad idea. Apple has done the same in OS X. It may even be the only way to force people to switch to a newer and better system. The gradual approach I took with rsyslog and my other implementations was possibly a wrong path. Backward compatibility may actually be not that important on a typical desktop system. However, in an enterprise environment such harsh moves can not be done. Even though Linux has become quite important, we still need to integrate various log sources, and syslog is still an excellent tool for doing so. The good news is that journald will not prevent the integration. For those in the need, a syslogd can run alongside journald. This is exactly what we do on Windows, when EventReporter runs alongside the Windows event log and reformats Windows events into standard syslog format for consumption by some central system.

Will journald succeed and replace the current logging system? It is hard to say with the few information I have at hand. But I’d say that chances are not bad it will on most systems. Thinking about home desktop machines, Laptops and a myriad of other personal computers: Rsyslog runs on (almost) all of them, and nobody knows it does ;) The folks operating these machines are not at all interested in logging, so I think it is a valid assumption that none of them will care which logging system is running. Thinking about resources, Red Hat funds the journald development (I wonder how it plays with auditd, btw – will they merge?). If journald will make its way via systemd into Fedora (and I guess it will), other users of systemd will probably follow. Using this chain of arguments, I’d say it is likely that journald will replace local syslogging. I have to say that this concerns me a bit, because the systemd/journald relationship looks so close that it will probably be hard to gain some healthy competition in this regard. After all, this concerns was a big argument for me to start the rsyslog project. Read my 2007 blog post “Why does the world need another syslogd?” and think of its arguments in regard to journald. I am happy to say that rsyslog helped make syslog-ng a much better choice by the competition it introduced. I am unsure if there can be real competition to journald (but, to be honest, one can question if my concern is worth the effort…).

So let’s assume journald will wipe out the rest of the Linux logging tools. What does that mean for the rsyslog project? Well, it gives it a somewhat different twist. I don’t think that rsyslog (or syslog-ng) will completely go away. Replacing the local logging system on a desktop is one story, but replacing heterogeneous network logging is a totally different one. Of course, nothing is made for eternity, but I envision syslog to be healthy for at least the next 10 to 20 years. But there will be a shift inside the user base. Today, rsyslog tries hard to be a platform useful both for low-end, home systems as well as enterprise environments. With journald, non-enterprise environments will probably disappear from the picture. This also puts rsyslog in a purely commercial context, and this is definitely something we have to think about. What is the point of open source software, if only commercial entities benefit from it, but not the authors? Today, we receive motivation (and some money-worth arguments!) from the fact that there is a very large installed base. Losing that motivation would of course have an effect. At least, it would be pointless to work on non-enterprise features. Why put a lot of effort into something that nobody uses?

So is the arrival of journald good or bad for the community? For someone with my bias, you would probably expect that I say “it’s bad”. But I am not sure. It has good points as well. Maybe Lennart really manages to set a new, better standard that application developers will utilize in a useful way. Maybe forcing projects like rsyslog to a high-end, commercial focus brings much more improvement in that area (just think about all that restrictions that I maintain purely for low-end systems or backward compatibility). I really don’t know if it is good or bad. There are risks, yet there are also chances. I will try to get some more details about journald and will probably post a couple of technical remarks to the claims Lennart makes. Other than that, I’ll probably just stand by as an interested observer. There is no urgent need to respond, maybe a little fiddling with feature priorities to not waste too much time. But other than that, I think I can just safely see how things progress. And rsyslog users can do so, too. If you don’t have any strong opinion on the situation, there is really no need to involve yourself.

Update: I now had a deeper look at the Linux Journal and journald, and there are a couple of things I don’t like. I suggest to read this post in addition to my first reaction here.

How to display XML data in Adiscon LogAnalyzer?

Log files usually do not contain XML data. However, this does not mean logs are necessarily non-XML. A prominent example is IHE, which transports XML documents inside syslog message. My post on Adiscon LogAnalyzer 3.3 drew some interesting comments from John Moerke, who sees use for it in an IHE environment.

I have now discussed with Andre on how to integrate such functionality inside the log analyzer. There are obviously a couple of questions to address, but a core question is how to deal with the hierarchic structure that XML offers. Traditionally, log file contain flat name-value pairs, so they can easily be mapped into a two-dimensional array (which is what you see when you look at Adiscon LogAnalyzer). The application is build around this concept. So a fundamental question is how to make sense out of an XML stream. An obvious answer is that we may display some fields in a flat overview, but display the full structure in detail view. This makes sense, but there are ample complexities in things like queries. Plus, it would probably require big changes to the engine.

Putting implementation effort aside for the moment, the big question is how users (you!) would like to work with XML data in a tool like Adiscon LogAnalyzer. Feedback is most appreciated!

log annotation with liblognorm

I have recently written about the concept of event (log) annotation and liblognorm. During the past days I have made my mind up and have begun implementing some stuff today. In essence, rule bases will receive a new rule type “annotate”, which contains the annotation part. Here is a sample from my lab environment:


rule=logon:<%-:number%>1 %timestamp:date-rfc5424% %src-id:word% ...
annotate=logon:+action="login"

Note the text in red. This is a liblognorm tag (not to confuse with a CEE tag!). This rule base tells the normalizer to append, according to the target format, the fields that are given in the annotate statement to any events that have the tag in question (“logon” in our case).

Today, I am extending the rule base parser to support the annotate rule. Within the next days, I’ll update the rest of the system. When this is done, I’ll probably release that version so that you can try out the new functionality in your own environment.

Some sample Adiscon LogAnalyzer Reports…

I thought I provide you a glimpse of which reports Adiscon LogAnalyser can generate. There are some interesting summary reports, like the Windows Event Log Summary Report  and the Syslog Summary Report. Of course, you can customize these reports based on the usual filtering capabilities. As an example, have a look at the syslog summary report just for “today”.  You can play with these options life at the Adiscon LogAnalyzer demo site.

Please note that we will be working on more reports in the months to come. Also, if you miss some report, you may consider sponsoring its development. This can be quite cost-effective compared to the many quite expensive solutions you otherwise need to use — or your programming time ;-)

Potential Blog Unreliability

Hi all, as you probably know, my blog’s design hasn’t changed in ages (yup, I’m a conservative guy). However, it finally is time to update things, so I’ll look at some new design (and maybe software) options. That means that the blog may be a bit under construction during the  next couple of days. Please pardon any problems associated with that — they will be temporary.

Adiscon LogAnalyzer 3.3.0 beta is out

Adiscon’s open source log analysis frontend LogAnalyzer has grown with some exciting new features. Most importantly, report generation speed has been much increased. This was made possible via tighter integration of the report logic with the actual log source (database or file). As a result, all reports are generated in considerably less time and require far fewer system resources to complete. Along the same lines, Adiscon LogAnalyzer now offers suggestions for indexing database sources. If it finds room for improvement, new indexes are automatically suggested. This results in overall improved speed throughout the application.

Also, finally a long-due user interface improvement has been made: to access the reporting feature, users needed to access the admin panel. This was kind of well-hidden and cumbersome. In 3.3.0, reports are directly available from Adiscon LogAnalyzer’s main panel. With this change, some users may even discover the reporting feature for the first time. The screenshot below gives you a sneak preview of the new interface.

Best of all, the new version has brought some under-the-hood improvements that we will utilize in the future to generate some really exciting new reports. Stay tuned, there is much more to come…

And finally let me say that work with the LogAnalyzer team to improve integration into rsyslog and the Adiscon’s Windows logging components. We are trying very hard to provide an easy to use, integrated solution.

thinking about a rsyslog client for Windows…

I have had a series of interesting talks during the past weeks. We at Adiscon have seen that there is high demand for closely integrating Windows machines into an rsyslog enterprise logging infrastructure. Of course, there are various ways to do that, and probably the best is using Adiscon’s other members of the MonitorWare product line. However, we can obviously go one step further and provide even thighter integration. For that reason, we will most probably soon create a special software package, the rsyslog for Windows client. It will provide

  • Event Log Forwarding
  • Log File Forwarding
  • Syslog Relay

capabilities, probably in different editions so that users can cover exactly their needs. While event log and file forwarding seem natural, syslog relay functionality may be a bit surprising, given the fact that rsyslog is available as a direct receiver. This feature is primarily targeted towards larger enterprises which may have no Linux machines in remote offices, but equipment they need to monitor via syslog. The core idea here is that we provide that functionality on a Windows box, which can than talk to the central rsyslog server via a reliable way.

We are currently discussing the details of this plan. I hope we will be able to show first results soon.

liblognorm event annotation … and CEE

As you probably know, CEE is an effort driven by MITRE to support a common event expression format. Liblognorm is a log normalizer library (aka “network event normalizer”). One of its primary target formats is CEE.

For pure normalizing needs, liblognorm extracts data fields from semi-structured log message. The extracted fields are available inside a (basically) name/value property list. Liblognorm also permits to classify messages, e.g. as being a logon or logoff message. For this classification, liblognorm provides so-called “tags”. These are simple words (strings of characters) which can be specified by the user. Tags reside in a special property called “tags”, but otherwise occupy a flat space (tags can easily be structured via punctuation).

CEE takes a slightly different approach: while it shares the tag concept (actually liblognorm inherited tags from an earlier version of CEE), CEE classifies tags into different tag types. For example, “logon” may be a tag, but can only be used to describe an action(-field). As such, “logon” can not be present by itself in a CEE log record, it must be given as value of the action field (‘action=”logon”‘). Also, CEE requires some other fields which may not be present explicitly from the original message even though the information may implicitly be present inside it. To express such information entities (and tags in the CEE way), liblognorm needs the capability to add additional fields to an  extracted event. Let call these set of fields the “annotation” for easier future reference. Liblognorm needs to annotate the event so that the target format’s (CEE) requirements are met. While I was talking about CEE so far, I assume (and know from previous experience) that other formats may have similar requirements, albeit different fields that need to be annotated.

The question is now: how to implement this in liblognorm? The initial idea was to include the annotation inside the normalization rule itself. That has a major drawback: If a rule base is to be used for CEE and some other format, the annotation may be different, and thus the same rule base cannot be used. These two rule bases would differ in just the annotation. So it seems more natural, and easier to maintain, to split the recognition rule from the annotation rule. In that setting, the message is recognized and classified by recognition rules and the annotation is based on (different) annotation rules. So only one set of recognition rules can be used by multiple annotation rules. Only the latter need to be redefined for different target formats (or systems).

This split-rule method is the way I intend to head to. In essence, the current “rule=” rule and its format will remain untouched. It will be augmented by “annotate=” rules, which contain the full annotation. The binding between these two will be done via classification (liblognorm tags): in the first step, the message is recognized, data extracted and tags assigned, just like it is currently done. Then a second step will be added. It traverses through the tags and adds all annotation that are defined for the message’s tag set. So the binding is on the tag set. Finally, it is probably necessary to add a third step that can remove unwanted fields. This step is probably target-format specific. For example, this step could eliminate the liblognorm tag set from an event if CEE compliance is desired, because CEE does not support, not even permit, an extra tag set.

Feedback on this approach is appreciated. It is my hope to be able to implement this in the near future.

filler fields in log normalization

When looking at some real-world rule bases for liblognorm, I noticed that it is often required to check for the presence of a specific field, but the value is actually not needed. This leads to fields named e.g. “filler”, “dummy”, “dummy<n>” with n being an increasing number. This is both clumsy and requires unnecessary processing power. For that reason, I have introduced “-” (dash) as field name. When this special name is used, the field as parsed as usual, but immediately discarded after the successful parse. So while we need to parse and extract in order to get the parse logic right, we save the effort to keep a copy of this unneeded data. This also means that output log records produced by the normalizer tool are cleaned up. I hope this is a useful addition.

Paper on LogNormalization

I wanted to make all of your aware that I have posted a paper on log normalization . This was originally done in regard to CEE, but I noticed that the classification of different log sources and the way to handle them is of broader use. I hope you find the paper useful.