Announcing Project Lumberjack

Two weeks ago, along with the Fedora Developer’s Conference in Brno, Czech Republic, a couple of logging and auditing folks from Red Hat, Balabit (syslog-ng), the MITRE Corporation, and Adiscon (me) stuck their heads together to talk about the future of structured logging. It quickly became clear that extending syslog in the CEE spirit is the right thing to do.

We observed that almost all technology is present to provide a rich framework to support structured logging. Actually, both syslog-ng and rsyslog provide the necessary plubming since long (for, example, as part of the RFC5424 effort), but that functionality is relatively seldom explored actively by other developers. A core problem in that regard is that most applications rely on the good old syslog() API, which does not provide structured logging by itself. Also, there is no common log storage database available, which tools could be based on.

In order to evolve syslog, we defined a three-layer architecture, with applications and logging libraries/APIs being the top layer, the syslogd the middle layer and the datastore the bottom layer. Multiple APIs must be supported as noone can expect projects to change their existing logging infrastructure. Also, existing frameworks like log4j or log4j and even glibc’s syslog() will stay around for a while longer. New libraries (like ELAPI) will  probably become more dominant for new applications. So how to tie these different libraries to the syslogd subsystem (the second layer)?

The solution is rather simple: we use what we already achieved in CEE and support cee-enhanced syslog on the system log socket. The core idea is very simple: we use the regular syslog message part, but include JSON-encoded structured data with it. To signify to the syslog system that this is actually cee-enhanced, a cookie string (“@cee:”) is used in front of the JSON data. It is then easy to decide for the syslogd which message format it deals with: if the cookie is present and the rest of the message is a valid JSON representation, the message is cee-enhanced. If one of the two conditions fails, it is traditional syslog. As both conditions are checked together, it is highly unlikely that a legacy syslog message will ever fit into that criteria (and if it really does, nothing is lost: after all, the syslogd has correctly understood that format). It must be noted that the necessary parsing and internal plumbin is available both in syslog-ng as well as rsyslog (I committed the missing JSON parser, held back awaiting a more final CEE standard, yesterday).

The interface to the log database layer is currently not as well defined and needs to be worked on. Note that both syslog-ng and rsyslog support multiple datastores, so there already exist solutions. The group as whole was of the opinion that some unified API for a log data store would be useful and something that should be looked at as a longer-term target.

After reaching this rough consensus, we were delighted to see that most of the base technology is already and place and just needs to be tied correctly together. It is more an effort of doing detail implementations and documenting the various pieces (and how they work exactly together) than creating a totally new system (aka “can be quickly done”). We agreed that it probably is best to reach for the low-hanging fruit first: get structured logging integrated first, then do the other steps. So an initial milestone will be making sure cee-enhanced syslog is supported by all of the subsystem and only after this is done reach for the other things.

One of these next things definitely is a dictionary of field names (and exact structure) to be used to describe events in a standard way (for example a logon event). While the whole effort is highly inspired by CEE, it probably is best to try out initial efforts outside of the formal CEE framework. That will enable rapid development, discussion and the capability to check what works in practice. The experience gained in such PoC can than be feed back to the formal CEE process (along the old IETF mantra “running code and rough consensus first”).

We agreed that such an effort is best be done in a tranparent and flexible open source process. With that, project lumberjack was born: an effort to provide better structured logging for Linux, being supported by many major players in that arena. We agreed that it would be a good idea if Red Hat provided some of the project infrastructure. This is why you find project lumberjack now at fedorahosted.org (note that the project will probably contain mostly specs and less code, which is kept in the individual project’s repositories).

parsing JSON-enhanced syslog

Strucuted logging is cool. A couple of month ago, I added support for log normalization and the 0.5 draft CEE standard to rsyslog. At last weeks Fedora Developer’s Conference, there was a huge agreement that CEE-like JSON is a great way to enhance syslog logging. To follow up on this concept, I have integrated a JSON decoder into libee, so that it can now decode JSON with a single method call. It’s a proof of concept, and for serious use performance optimization needs to be done. Besides that, it’s already quite solid.

Also, I just added the mmjsonparse message modification module to rsyslog (available now in git master branch!). It checks if the message contains an “@JSON: ” cookie and, if so, tries to parse the resulting string as JSON. If that succeeds, we obviously have a JSON-enhanced message and the individual name/value pairs are stored and can be used both in filters and output templates. This provides some really great opportunities when it comes to processing the structured data. Just think about RESTful interfaces and such!

Right now, everything is at proof of concept level, but works well enough for you to try it. I’ll smoothen some edges but will release the versions rather soon. Probably the biggest drawback is that the JSON processor currently flattens the event, with structure being conveyed via field names. That means if you have a JSON object “SUPER” containing a number of fields “field1” to “fieldn”, the current implementation will be a single level and the names are “SUPER.field1”,… I did this in order to have a quick solution and one that fits into the existing framework. I’ll work on creating real structure soon. It’s not really hard, but I probably do some other PoCs first ;)

I considered several approaches, among them moving over to libcollection (part of ding-libs) or a pure JSON parser. The more I worked with the code, the more it turned out that libee already has a lot of the necessary plumbing and could simply been enhanced/modified under the hood. The big plus in that approach is that is immediately plugs in into rsyslog and the other solutions that already built on it. This even enables to use the new functionality in the v6 context (I originally thought I’d need to move on to rsyslog v7 for the name-value pair changes). Now that I have written mmjsonparse, this really seems to work out. No engine change was required, and I expect little need for change even for the final version. As such, I’ll proceed in that direction. Actually, what I now use is kind of a hybrid approch: I use a lot of philosophy of libcollection, which showed me the right route to take. Then, I use cJSON, which is a really nice JSON parser. In the proof of concept, I use both cJSON’s object model and libee’s own. I expect to merge them, actually tightly integrating cJSON. The reason is that CEE has evolved quite a bit in the mean time, and many complex constructs are no longer required. As such, I can streamline the library as well, what not only reduces complexity but speeds up the whole process.

I would like to express my sincere thank to Dmitri Pal, Keith Robertson and Bill Heinbockel, which provided great advise and excellent discussion. And the best is that this part of the effort is just the beginning… Stay tuned for more!

Java Exceptions and syslog, anyone?

Anyone out there having problems receiving Java exceptions via syslog? If so, please let me know. We have begun to look into log4j and potential formatting problems. However, we need someone who is really bugged by this in order to see real-world cases so that we can think about real-world cures. So if you have issues, please let me know.

imuxsock rate limiting now opt-in feature

I added rate-limiting capabilities directly into rsyslog‘s imuxsock some time ago. These were turned on by default and blocked processes from writing more than 200 messages within a 5-second interval (accounted on a per-pid level). This default, however, caused quite some confusion. I monitored reactions since the feature was introduced. Last week, I noticed Chris Gaffney, too, telling about his bad experience on twitter and suggesting to change that feature from opt-out to opt-in. That was the missing complaint ;) So I did exactly what Chris suggested, and starting with 5.9.7 that feature requires an explicit opt-in. To do so, I have simply changed the default. Now, you need to tell it the rate limiting interval, which must be non-zero, in order to activate rate limiting. The default is zero, what keeps it deactivated. Currently available via git. It will take some time, though, for this changed default to migrate to the stable branch.

rsyslog will remain GPLv3 licensed

Licensing is though topic. I tried to explain some of the upcoming rsyslog license changes with yesterday’s blog post. While I tried to cover all aspects, I have probably manged to create some confusion. I try to cleanup this mess today. In doing so, I will leave out some of the fine details but focus on the prime visible facts. 

First of all, the rsyslog project as whole will stay under GPLv3. If you dig down into the details, the current version is licensed under GPLv3, with a large body of code (the rsyslog runtime) being licensed under LGPLv3. In straight words, this means that rsyslog as whole can not be included in a commercial product while the rsyslog runtime can be. So if someone intends to provide rsyslog-like functionality, he or she can use the runtime and rewrite the rest of the system (but not copy it). Also, it is possible to create commercial rsyslog plugins with the current system, but then one relies either on a “creative” interpretation of the GPL or use message passing e.g. via pipes between core and plugin.

With the intended changes these basic facts, in regard to rsyslog as whole, are not changed at all. However, the details change: the body of code that is licensed under a permissive license (allowing use inside commercial applications) will be increasing. Still, some key files will remain under GPLv3, and so will the overall rsyslog project. Also, in the future the permissive license used by rsyslog will probably be the Apache software license (ASL 2.0). This open source license is used by a myriad of well-known software products, with the Apache http server being a prime example. It is not sure if ASL 2.0 will totally replace LGPLv3 inside the rsyslog runtime, this depends on contributor reactions. For many of the same reasons, it is also not yet clear what exactly the GPLv3 core of rsyslog will be.

Why is this change benefitial to the project? Maintaining and developing rsyslog is costly. In the typical open source business model, these costs are covered by the sales of project-related services. Unfortunately, the rsyslog project received relatively little funding via this way and is still heavily sponsored by Adiscon, which can not bear the majority of the cost for all time to come.

One problem with receiving funding is that some potential customers – especially large ones who could considerably contribute to funding – do not like to license under GPLv3 for one reason or the other. One “solution” to that problem would have been to dual-license rsyslog in its current form. We actually considered that (blog posting) but stepped back from the initial approach after discussion with key community members. As described in the mentioned blog posting, it would not have been very hard for Adiscon as the main copyright holder to change the licensing model. However, this would probably have meant that a commercial and a non-commercial fork of rsyslog would have been created with potentially large differences in the code base.

The move of more code under a permissive license prevents this problem. With the new model, only relatively few key files would need to be dual-licensed. This prevents large diversity inside the code base just for licensing reasons. Also, the ASL is far more appealing to many large users, so we hope to gain additional deployments – and thus potential customers – from that move.

Finally, this model facilitates the ability to provide commercial plugins. Commercial plugins were always OK with the project, and as said above, can even be written and distributed under the current licensing scheme. The new licensing scheme makes it easier to support such plugins and encourages technical superior solutions. How exactly the licensing in this regard will be is not yet fully thought out. One solution might be to add a special exemption to the then-smaller GPLv3 core, that explicitely permits plugins (getting the wording right may be somewhat tricky). Another one is that someone who intends to ship commercial plugins must rewrite the rsyslog GPLv3 core, which is no longer that hard even for external entities as the GPLv3 core is smaller (one may argue that Adiscon as the main contributor has an advantage here over others; I can’t decline it but I don’t find it unfair either – after all, Adiscon has spent considerable effort on rsyslog, so why not reap a small benefit in this situation?). These solutions are the extreme ends of the solution spectrum – it could probably also be anything in between.

Why is this useful to the community? Obviously, the changes help fund the project, and thus help to keep it not only maintained but well-enhanced as well (there are many cool things I have on my mind, but the current time constraints do not permit me to work on them). This is even more important as the journald project will create a new, mostly commercial, environment for rsyslog. In the future, rsyslog needs to compete with syslog-ng primarily in that part of the Linux ecosystem. Balabit, which traditionally dual-licenses syslog-ng under a proprietary license has a big advantage regarding this customer base from that fact alone. So far, rsyslog could make up its disadvantage by the fact that it was installed on each system, an advantage that journald will very probably remove from rsyslog. The other benefit is that the more permissive licensing model will probably attract additional software vendors and maybe additional parties, especially if we can move the full rsyslog runtime to ASL. There seems to be a movement inside the industry towards this type of licenses, at least for projects playing in the enterprise environment. If all works out well, we may even get some more contributors and thus the ability to include additional features into the project.

In short, the licensing change will not affect that much of what actually can be done with rsyslog code, but it provides rsyslog with some additional options that benefit both the project and the community.

I hope I have expressed myself clearly enough this time. Again, it has become a long post, even though I omitted some detail information given in yesterday’s post. Licensing obviously is tough ;) I hope to soon be able to use my time for more productive things, again…

rsyslog licensing update

As regular readers know, there has been discussion on rsyslog‘s licensing model for quite some time, I guess much like the beginning of the project. Many things influence its licensing, community support and the ability to fund the project are among these. Over the years, we had several license changes, so things tend to evolve. The whole discussion was restarted in 2011 after the introduction of the systemd journal idea in November. A prime reason is that journald pushes rsyslog far more into the commercial-only space than we originally anticipated, as journald will probably push rsyslog out of many low-end systems (for details, see my original argument on rsyslog vs. journald, which I still support).

This lead to a lot of discussion with my peers at Adiscon. This resulted in a blog post on potential dual-licensing of rsyslog (to get the fine details, read that post, I don’t repeat them here; the post is still valid in regard to arguments and problems). That post, not unexpectedly, lead to a larger discussion on the rsyslog mailing list as well as quite some private conversations. In these discussions, some folks expressed they like an open core strategy more than dual licensing, while others expressed it right the other way around. Thankfully, some understanding for the need to fund the project was expressed as well ;-). Of course, no solution was reached and nothing was changed.

I have now taken the time to digest the discussion, Adiscon needs, and some overall trends. And I have briefly approached my peers at Adiscon as well as some contributors with a plan to go ahead in the most unintrusive way. So far, the idea was not killed (but also not discussed in-depth), so we may proceed with it. The core idea is that we must receive some funding for the work done. This is especially important in regard to the new environment that journald will create in the medium to long term. One thing that we learned during the past years is that commercial vendors are often hesitant to put GPLv3 code into their systems. Even larger key users slightly associated with software are hesitant (almost everyone falls into this category nowadays, think of embedded systems) for this reason. Ultimately, this somewhat blocks rsyslog’s use inside commercial entities. And by doing so, it also limits our funding opportunities because it is hard to sell support and services to someone who does not use the project.

To avoid this obstacle, the idea is to put a larger body of rsyslog under a more permissive license. Looking at current trends, the Apache Software License (ASL 2.0 to be precise) seems to be a good fit. Actually, one thing that Adiscon already decided is to put code written by Adiscon under this license. I have already converted some files after careful examination of contributors, original sources and development history (not a job I really like to spend time on, but…). We also hope to gain support by contributors to place all plugins under the ASL. In order to clean up licensing, it would also be useful (though somewhat optional) to put the rsyslog runtime, currently licensed under LGPLv3, under the ASL as well. We need to see how this works out with contributors, though (this is obviously the largest bunch of code). It is important to note that while LGPL is not “GPL-free” and may still cause some friction with our future primary user base, it is pretty similar to ASL in regard to what it permits. Having all runtime code under the ASL would probably help a bit with those “GPL-skeptic” enterprise customers.

Note that the ASL permits unlimited commercial use to anyone. I would still like to keep some leechers off the code (my original motivation to use GPLv3, among many others; my 2007 blog post on that is probably an interesting read in addition to this posting here – some things seem not to have worked out ;-)). My idea is to keep some files under GPLv3. This is not broadly discussed yet and details need to be worked out. Obviously, that would be counterproductive to what we intend to do with ASL 2.0. One could, however, optionally release these parts under a different license and thus somewhat “solve” the situation via relatively little modifications and different code. I agree, it boils down to some dual licensing, but in a smaller magnitude. Most importantly, everyone would be free to do his own implementation of the GPL parts and use the rest of rsyslog (the majority) with that freshly written code. [Please note that even though I changed some files to ASL, others are still under GPLv3, simply because I can not change licenses where Adiscon is not the sole copyright holder. So this is no indication that my idea is currently being implemented.]

In any case, if we mange to complete the re-licensing -with or without GPLv3 parts- everyone would be free to extend rsyslog with extra functionality. This obviously includes Adiscon as well. It would give us some extra funding opportunities, as we could create some plugins targeted at large commercial entities. One thing that immediately pops up my mind is a new queue subsystem (the queue driver is internally “plugin-ish”), which could integrate many of the improvements I have on my mind. While I would really love to write that beast, it is a considerable effort that Adiscon declines to fund and no other party has yet been willing to fund it either — but some expressed interest to provide limited financial support for it. If Adiscon could provide such a piece under a commercial license (at least for a while), I could probably persuade my peers to try this out and fund the effort. I honestly think this situation would be a win for everyone. With the current policies, it looks like this new subsystem will probably never materialize, so what is to lose? Highly specific input and output plugins also create similiar opportunity.

The ability to use commercial and non-commercial plugins is something that I always considered a valid use of rsyslog technology (see “writing rsyslog plugins” right at the bottom, under “licensing”). I just never cared to dig into the licensing implications, which make it somewhat hard to provide plugins under a GPL-incompatible license (contrary to what I originally intended). Note that in any case it would even be possible with a totally GPLv3 codebase (even if the rsyslog runtime would not be LGPL, what is is licensed under). The only difference is that with the current system, it would either require a creative interpretation of the GPL (something many open source software vendors have done before) or a somewhat technically inferior approach that avoids direct linking (using pipes, which is really not that bad, so it is a vital option and would probably be the one I’d choose). Again, with the intended re-licensing, we could remove some friction associated with “GPL-phobic” companies, thus increasing our funding abilities.

All in all, I think the proposal to re-license large parts of rsyslog under the Apache license is of benefit both the project as well as the community. It hopefully helps to provide the necessary funding for development and at the same time may even grow the community involvement as it becomes more attractive for other software vendors to participate in the rsyslog development (I found this article by Brian Proffitt a very interesting read in that regard; if you read it, be sure to follow its links for details). I would really appreciate if we could find support for this movement and, yes, I have to admit I consider it rather vital, especially with the new role rsyslog will play in the systemd dominated world. Let me conclude this posting with one more personal opinion: I think that any feature developed thanks to commercial licensing should immediately be available, for free, to non-commercial entities, like private parties, educational institutions and charities. For the same reasons, it should probably reside inside the public code repository and be free to explore by anyone.

PS: back in 2009, I wrote about my philosophy about the “free” in free and open source software. It may be worth a read in addition to this posting, as it probably explains some of the mindset behind my ideas.

UPDATE: This blog posting created some confusion if the rsyslog project will stay under GPLv3. In fact, it will! I wrote on new blog post clarifying the situation, please check it out.

rsyslog v6 stable released

Happy new year! After being back from vacation, I started the new year with finally releasing the first stable rsyslog v6. While it was ready for a while, I was hesitant to release it when there was so much going on and time left for any quirks that may show up (I know I all too often overlook a thing or two with such a release and, if so, it is good to react fast). So today was finally the day. Let’s see and wait if I failed somewhere.

The new release is a very important one for me. First of all, because it helps me to get down to a decent set of versions I need to support. Doing all that v4..6 stuff in parallel with multiple branches really became time-consuming (but manageable thanks to git!). V4 has now been retired, and the number of branches is reduced to a minimum.
 
Very important is the new log classification capability. While it is in the beta (and devel) for quite a while, I think people were hesitant to try it out. At the same time, this prevented us from developing it any further. With it now being part of a stable release, I hope that it will get momentum.

Besides software availability, rule bases for common syslog messages are also important. Adiscon will assist with creating these rule bases upon request. I hope that people will use that facility. All they need to do is provide sufficient information about what they want to normalize and some sample messages for us to work with. We will than see how to create the required rulebase. This whole effort is based on liblognorm, so the technology can also be used by other projects (sagan is a good example that actually currently has more interest in that feature that rsyslog has).

Serious syslog problems?

In the paper introducing journald/Linux Journal a number of shortcommings in current syslog practice are mentioned. The authors say:

Syslog has been around for ~30 years, due to its simplicity and ubiquitousness it is an invaluable tool for administrators. However, the number of limitations are substantial, and over time they have started to be serious problems:

I have now taken some time to look at each of these claims in depth. But before I start, I need to tell that I am working in the IT logging field for nearly 15 years, have participated in a number of standards efforts and written a lot of syslog-related software with rsyslog being a prime example (some commercial tools I have been involved with can be found here). So probably I have a bias and my words need to be taken with a grain of salt. On the other hand, the journald authors also have a bias, so I guess that’s a fair exchange of arguments ;). 

In my analysis, I compare the journald effort with what rsyslog currently provides and leave closed source software out. It is also important to note that there is a difference between syslog, the protocol, a specific syslog application (like rsyslog) and a system log message store. Due to tradition, these terms are often used for different things and one must deduce from context, what is meant. The paper applies the same sloppiness in regard to terms. I use best effort to extract the proper meaning. I quote the arguments as they originally appeared inside the paper. However, I rearrange them a bit in order to put related things closer together. I retain the original numbering so that you can compare to the original paper. I also tried to be similar brief with my arguments. Now proof-reading the post, I see that I failed with that. Sorry, but that’s as brief as I can provide serious counterargument. I broadly try to classify arguments in various levels of “True” vs “Wrong”, so you may take this as an ultra-short reply. 

So let’s start with Arguments related to the log storage system. In general, the paper is right that there is no real log storage system (like, for example, the Windows Event Log). Keeping logs only in sequential text files definitely has disadvantages. Syslog implementations like rsyslog or syslog-ng have somewhat addressed this by providing the ability to use databases as storage backends (the commercial syslog-ng fork also has a proprietary log store). This has some drawbacks as well. The paper proposes a new proprietary indexed syslog message store. I kind of like this idea, have even considered to write something like this as an optional component for rsyslog (but had no time yet to actually work on it). I am not convinced, though, that all systems necessarily need such a syslog storage subsystem.

With that said, now let’s look at the individual arguments:

5. Reading log files is simple but very inefficient. Many key log operations have a complexity of O(n). Indexing is generally not available.

True. It just needs to be said that many tools inside the tool chain only need sequential access. But those that need random access have to pay a big price. Please note, however, that it is often only necessary to “tail” log files, that is act on the latest log entries. This can be done rather quickly even with text files. I know both the problems and the capabilities, because Adiscon LogAnalyzer, in which I am involved, is a web-based analysis and reporting tool capable of working on log files. Paging is simple, but searching is slow with large files (we recommend databases if that is often required). Now that I write that, a funny fact is that one of the more important reasons for creating rsyslog was that we were unhappy with flat text files (see rsyslog history doc). And so I created a syslogd capable of writing to databases. Things seem to be a bit cyclic, though with a different spin ;)

8. Access control is non-existent. Unless manually scripted by the administrator a user either gets full access to the log files, or no access at all.

Mostly True and hard to make any argument against this (except, of course, if you consider database back ends as log stores, but that’s not the typical case).

10. Automatic rotation of log files is available, but less than ideal in most implementations: instead of watching disk usage continuously to enforce disk usage limits rotation is only attempted in fixed time intervals, thus leaving the door open to many DoS attacks.

Partly True, at least in regard to current practice. Rsyslog, for example, can limit file sizes as they are written (“outchannel action”), but this feature is seldomly used and due to be replaced by a better one. The better one is partly implemented but received no priority because nobody in the community flagged this as an urgent requirement. As a side-note: Envision that journald intends to shrink the log and/or place stricter restrictions on rate-limiting when disk space begins to run low. If I were an attacker, I would simply begin to fill the disk then, and make journald swipe out the log store for me.

11. Rate limiting is available in some implementations, however, generally does not take the disk usage or service assignment into account, which is highly advisable.

It needs to be said what “rate limiting” means. I guess it means preventing an application from spamming the logs with frequently repeated messages. This feature is available  in rsyslog. It is right that disk usage is not taken into account (see comment above on implications). I don’t know what “service assignment” means in this context, so I don’t comment on that one. Rate limiting is more than run-away or spamming processes. It is a very complex issue. Rsyslog has output rate limiting as well, and much more is thinkable. But correct, current rate limiting looks at a number of factors but not the disk assignment. On the other hand, does that make sense, if e.g. a message is not even destined to go to the disk?

12. Compression in the log structure on disk is generally available but usually only as effect of rotation and has a negative effect on the already bad complexity behaviour of many key log operations.

Partly True. Rsyslog supports writing in zip format for at least one and a half year (I am too lazy to check the ChangeLog). This provides huge savings for those that turn on the feature. Without doubt, logs compressed in this way are much harder to process in real-time.

7. Log files are easily manipulable by attackers, providing easy ways to hide attack information from the administrator

Misleadingly True. If thinking of a local machine, only, this is true. However, all security best practices tell that it is far from a good idea to save logs on a machine that is publicly accessible. This is the reason that log messages are usually immediately sent do some back end system. It is right that this can not happen in some setup, especially very small ones.

My conclusion on the log store: there definitely is room for improvement. But why not improve it within the existing frameworks? Among others, this would have the advantage that existing methods could be used to decide what needs to be stored inside the log store. Usually, log contain noise events that administrators do not want to log at all, because of the overhead associated with them. The exists best practices for the existing tool chain on how to handle that.

Now on to the other detail topics:

1. The message data is generally not authenticated, every local process can claim to be Apache under PID 4711, and syslog will believe that and store it on disk.

9. The meta data stored for log entries is limited, and lacking key bits of information, such as service name, audit session or monotonic timestamps.

Mostly wrong. IMHO, both make up a single argument. At the suggestion of Lennart Poettering, rsyslog can force the pid inside the TAG to match the pid of the log message emitter – for quite a while now. It is also easy to add additional “trusted properties”. I made an experimental implementation in rsyslog yesterday. It took a couple of hours and the code is available as part of rsyslog 5.9.4. As a side-note, the level of “trust” one wants to have in such properties needs to be defined – for truly trusted trusted properties some serious cryptography is needed (this is not specified in the journald proposal nor currently implemented in rsyslog).

2. The data logged is very free-form. Automated log-analyzers need to parse human language strings to a) identify message types, and b) parse parameters from them. This results in regex horrors, and a steady need to play catch-up with upstream developers who might tweak the human language log strings in new versions of their software. Effectively, in a away, in order not to break user-applied regular expressions all log messages become ABI of the software generating them, which is usually not intended by the developer.

Trivial (I can’t commit myself to a “True” or “Wrong” on such a trivial finding). Finally, the authors have managed to describe the log analysis problem as we currently face it. This is not at all a syslog problem, it is problem of development discipline. For one, syslog has “solved” this issue with RFC5424 structured data. Use it and be happy (but, granted, the syslog() API currently is a bit problematic). The real problem is the missing discipline. Take, for example, the Windows Event Log. The journald proposal borrows heavily on its concepts. In Windows Event Log, there is a developer-assigned unique ID within the application’s reserved namespace available. The combination of both app namespace (also automatically created) and ID together does exactly the same thing as the proposed UUID. In Windows Event Log, there are also “structured fields” available, but in the form of an array (this is a bit different from name-value pairs but far from totally different). This system has been in place since the earliest versions of Windows NT, more than 15 years ago. So it would be a decent assumption that the problem described as a syslog problem does not exist in the Windows world, right (especially given the fact that Windows purposefully does not support syslog)? Well, have a look at the problems related to Windows log analysis: these are exactly the same! I could also offer a myriad of other samples, like WELF, Apache Log Format, … The bottom line is that developer discipline is not easy to achieve. And, among others, a taxonomy is actually needed to extract semantic meaning from the logged event. It probably is educating to read the FAQ for CEE, a standard currently in development that tries to somewhat solve the logging mess (wait a moment: before saying that CEE is a bunch of clueless morons, please have a look at the CEE Board Members first).

3. The timestamps generally do not carry timezone information, even though some newer specifications define support for it.

Partly Wrong. High-Precision timestamps are available for many years and default in rsyslog. Unfortunately, many distros have turned them off, because they break existing tools.  So in current practice this is a problem, but it could be solved by deleting one line in rsyslog.conf. And remember that if that causes trouble to some “vital” tool, journald will break that tool even more. Note that some distros, like Gentoo, already have enabled high precision timestamps.

4. Syslog is only one of many log systems on local machines. Separate logs are kept for utmp/wtmp, lastlog, audit, kernel logs, firmware logs, and a multitude of application-specific log formats. This is not only unnecessarily complex, but also hides the relation between the log entries in the various subsystems.

Rhetorically True – but what why is that the failure of syslog? In fact, this problem would not exist if developers had consistently used syslog. So the problem is not rooted in syslog but rather in the fact that syslog is not being used. Lesson learned: even if standards exist, many developers simply ignore them (this is also an interesting argument in regard to problem number #2, think about it…).

13. Classic Syslog traditionally is not useful to handle early boot or late shutdown logging, even though recent improvements (for example in systemd) made this work.

True – including that fact that systemd already solved that problem.

14. Binary data cannot be logged, which in some cases is essential (Examples: ATA SMART blobs or SCSI sense data, firmware dumps).

Wrong, the short answer is: it can be logged, but must be properly encoded. In the IETF syslog working group we even increased the max message sizes for this reason (actually, there is no hard limit anymore).

The longer, and more correct, answer is that this is a long-standing discussion inside the logging world. Using that view, it is hard to say if the claim is true or false; it often even is argued like being a religion. Fact is that the current logging toolset does not work well for binary data (even encoded). This is even the case for the Windows Event Log, which supports binary data. In my view, I think most logging experts lean towards the side that binary data should be avoided and, if unavoidable, must be encoded in a text-friendly way. A core problem with the usefulness of binary data is that it often is hard to decode, and even more to understand, on the non-native platform (remember that the system used during analysis is often not the system where the event was initially recorded).

6. The syslog network protocol is very simple, but also very limited. Since it generally supports only a push transfer model, and does not employ store-and-forward, problems such as Thundering Herd or packet loss severely hamper its use.

Wrong,  missing all improvement made in the past ten years. There is a new RFC series which supports TLS-secured reliable transmission of syslog messages and which permits to place fine-grain access control on who can talk with whom inside a relay chain. UDP syslog is still available and is so for good reason. I cannot dig into the details here, part of that reasoning is on the same grounds why we use audio more often over UDP than TCP. Using UDP syslog is strictly optional and there are few scenarios where it is actually needed. And, if so, the “problem” mentioned is actually a “solution” to a much more serious problem not even mentioned in the journald paper. For a glimpse at these problems, have a lock at my blog post on the “reliability problem”. Also, store-and-foward is generally available in rsyslog via action queues, failover handling and a lot of other things. I admit that setting up a complex logging system, sometimes requires an expert. On the “loss issue”, one may claim that I myself say that plain TCP syslog is not totally lossless. That claim is right, but the potential loss Window is relatively small. Also, one can use different protocols that solve the issue. In rsyslog, I introduced proprietary RELP for that very reason. There is also completely lossless RFC3195, which is a great protocol (but without future). It is supported by rsyslog, but only extremely few other projects implement it. The IETF (including me) assumes that RFC3195 is a failure – not from technical fact but in the sense that it was too far away from the usual logging practice to be picked up by enough folks. [Just to avoid mis-intepretation: contrary to RFC3195, RELP is well alive, well-accepted and in widespread use. It is only RFC3195 that is a failure.]

Concluding my remarks, I do not see anything so broken in syslog that it can only be fixed by a total replacement of technology. Right contrary, there is a rich tool set and expertise available. Existing solutions, especially in the open source world, are quite modular and can easily be extended. It is not even necessary to extend existing projects. A new log store, for example, could also be implemented by a new tool that imports a decent log format from stdin to a back end data store. That would be easily usable not only from rsyslog but from any other tool that is part of the current log tool chain. For example, it may immediately consume Apache or other application logs (of course, such a tool would require proper cryptography to be used for cryptographic tasks…). There is also need for a new logging API – the catch-all syslog() call is clearly insufficient (an interesting detail fact is that journald promises to retain syslog() as a first-class logging interface — that means journald can solve none of the issues associated with that API, especially in regard to claim #2).

So extending existing applications, or writing new ones that tightly integrate into the existing toolset is the right thing to do. One can view journald as such an extension. However, this extension is somewhat problematic as its design document tells that it intends to replace the whole logging system. Especially disturbing is that the reasoning, as outlined above, essentially boils down to a new log store and various well-known mostly political problems (with development discipline for structured formats right at the top of them). Finally, the proposal claims to provide more security, but fails to achieve at least the level that RFC5848 syslog is able to provide. Granted, rsyslog, for example, does not (yet) implement RFC5848. But why intends journald to implement some home-grown pseudo security system when a standard-based method designed by real crypto experts is available? I guess the same question can be applied to the reasoning for the journald project at large.

Let me conclude this posting with the same quote I started with:

Syslog has been around for ~30 years, due to its simplicity and ubiquitousness it is an invaluable tool for administrators. However, the number of limitations are substantial, and over time they have started to be serious problems:

Mostly Wrong. But it is true that syslog is an invaluable tool,especially in heterogeneous environments.

Trusted Properties in rsyslog

Today, I implemented “trusted (syslog) properties” inside rsyslog’s imuxsock module. The term “trusted” refers to the fact that these properties can not be faked by the logging application, creating an additional layer of log integrity protection. The idea is rooted in the journald proposal, where they are called “metadata” and “trusted fields”. Actually I liked the idea implied by “trusted”, but thought “property” would be a better name than “field”.

The concept is not totally new. Actually, for some month rsyslog can patch the PID field of the syslog TAG with the correct pid, so that this cannot be mangled with. This was based on an idea from Lennart Poettering, which I found nice and implemented quickly (I met him at Linux Tag 2010 in Nürnberg, Germany where we discussed this and other things). The core idea is to use SCM_CREDENTIALS so that the OS itself records pid, gid and uid. With the new feature, this is taken one step further. Now, we also query the /proc virtual file system for additional information like the location of the logging application’s binary. Undoubtedly, this provides some extra protection against faked messages. On the downside, it has some obvious overhead. A simple and immediate solution to this is to use rsyslog’s omfile in zip mode. In journald, overhead is tried to avoid via a proprietary binary format, its event log, which provides compression features (but for syslog transmission the journald event log obviously needs to be decompressed as well). Some restrictions exist with trusted properties, some obvious, some less obvious (see the trusted property doc for details; it also has the list of currently supported properties).

The current implementation is in experimental status. Based on feedback, some specifics may be changed in future versions. Also, the current implementation does not try to be standards-compliant. This will probably also change in the future. I hope that the new capability is useful to the logging community. As a side-note, the new feature, implemented in one morning, also shows that it often is easy to extend existing technology instead of writing everything new from scratch ;)

The actual release announcement will go out either today or tomorrow. The code is available via the v5-devel git branch right now.

What I don’t like about journald / Linux Journal

I heard of journald only a couple of hours ago (Tuesday?) and since then some intense discussion is going on. I had a chance to look at the journald material in more depth. I also had a quick look at journald’s source, but (as I know) Lennart and I are on the strictly opposite sides in regard to the amount of comment lines in source files (I put half the spec in, Lennart nothing at all ;-)). So I did not try too hard to make sense of the code and my impression is still primarily based on the initial paper (though I have to admit his code is probably simpler as he does not need to carry any legacy nor consider platforms other than recent Linux). 

The contra-syslog arguments can be classified in two classes: vaporware and correct fact. In the vaporware camp are things like the hash chaining “security urban legend”, the timezone argument (though he is right in regard to current practice inside distros), syslog network transport and compression (this list is not conclusive). Technically correct is the current store format, different log sources, and free-formedness of messages (I prefer the term “semi-structuredness”). This list is also not conclusive.

I think Lennart makes some good points, but discredits the paper somewhat by going overboard at times. It looks like he really needed some hard selling points (I also got this impression by his usage of the kernel.org breakage to promote this effort…). I think his paper would have been more useful if he had argued only those problems that actually exist. I am in full agreement that there are some spots that really deserve to be changed and addressed. Unfortunately, the paper is phrased in such terms that people not at least at the medium expert level on logging tend to believe everything that is stated.

The question is how the actual problems can best be fixed. Is it necessary to create a totally new infrastructure and throw away everything that exists? Maybe. I still prefer the alternative approach: why not extend existing technology? I modeled rsyslog specifically for this reason to be a highly modular system where extensions can easily be added. As far as I understand, syslog-ng has also moved to such a design in the recent v3 version. In rsyslog, I have accepted even experimental technology inside the source tree quickly. Getting a new log store was on my agenda for quite some month (the syslog-ng commercial fork already has it). I unfortunately had not enough time to implement it – and nobody else helped out with it. Wouldn’t it have been a good idea to contribute something to rsyslog instead of crafting something totally new?

Another thing that I strongly doubt is if the Linux journal idea will actually manage to solve the logging format dilemma. Microsoft’s event log is in place for 15+ years, and app developer still don’t use it correctly (as I initially wrote, the Linux Journal looks quite similar to the Windows Event Log). While I think the UUID idea is actually not a bad one, I seriously doubt all developers will understand and use it (correctly). This is a problem with the Windows Event log as well. One needs to know that a lot of high-profile folks are working for several years (10+) on solving this dilemma. Lennart may be a genius, but I have concerns that he over-promises (but I really wish he has success, that would be a very, very big advantage for the community).

One thing, I have to admit, that disappoints me is that Lennart never approached me with his proposal. He knows me (even personally) and we have worked together on systemd/rsyslog integration. I heard about journald first on a Google Alert and quickly after from some folks who asked me what went on. Then  I found out that the systemd development mailing list also never mentioned any work on journald. So, to me, it looks the idea was well hidden for a surprise at Kernel Summit. Well done, but not my style ;-) This missing openness concerns me. My decisions in regard to rsyslog were controversial at times and dictatorial at others (and for sure sometimes wrong). And we currently have some big and controversial discussion on rsyslog going on (partly fueled by the arrival of journald). But I have always played very open, communicated what I had on my mind (in advance), discussed and did never try to hide something. This, to be honest, is how I expect work to be carried out on an important system component. I also never met Lennart at any of the standard bodies work on logging. Not everyone runs Linux and probably not even everyone on Linux will run journald. So standards matter.