Would creating a simple Linux log file shipper make sense?

I currently think about creating a very basic shipper for log files, but wonder if it really makes sense. I am especially concerned if good tools already exists. Being lazy, I thought I ask for some wisdom from those in the know before investing more time to search solutions and weigh their quality.

I’ve more than once read that logstash is far too heavy for a simple shipper, and I’ve also heard that rsyslog is also sometimes a bit heavy (albeit much lighter) for the purpose. I think with reasonable effort we could create a tool that

  • monitors text files (much like imfile does) and pulls new entries from them
  • does NOT further process or transform these logs
  • sends the resulting file to a very limited number of destionations (for starters, I’d say syslog protocol only)
  • with the focus on being very lightweight, intentionnally not implementing anything complex.
Would this be useful for you? What would be the minimal feature set you need in order to make it useful? Does something like this already exist? Is it really needed or is a stripped-down rsyslog config sufficient?
I’d be grateful for any thoughts in this direction.

liblognorm’s “rest” parser now more useful

The liblognorm “rest” parser was introduced some time ago, to handle cases where someone just wants to parse a partial message and keep all the “rest of it” into another field. I never was a big fan of this type of parser, but I accepted it because so many people asked. Practice, however, showed that my concerns were right: the “rest” parser has a very broad match and those that used it often got very surprising results.

A key cause of this issue was that the rest parser had the same priority as other parsers, and most importantly a higher priority than a simple character match. so it was actually impossible to match some constant text that was at the same location than the “rest” parser.

I have now changed this so that the rest parser is always called last, if no other thing matches – neither any parser nor any constant text. This will make it work much more like you expect. Still, I caution against using this parser as it continues to provide a very broad match.

Note that the way I have implemented this is not totally clean from a software engineering point of view, but very solid. A cleaner solution will occur during the scheduled rewrite of the algorithm (later in spring/summer).

Note that existing rulebases using “rest” may behave differently with the new algorithm. However, previously the result was more or less random, so any other change to the rulebase could also have caused different behaviour. So this is no compatibility break as there really is no compatibility to retain.

This will be released with 1.1.2, probably in early may. If you need it urgently, you can use a daily build.

LinuxTag Presentation now online

I realized that I had forgotten to upload my LinuxTag Berlin 2014 presentation on rsyslog enhancements and writing external plugins. I have now uploaded it, so you can view it here:

liblogging-stdlog – code reviewers sought

I am looking for some code reviewers.

I have worked hard on liblogging-stdlog, which aims at becoming the new enhanced syslog() API call. The library is thread- and signal-safe and offers support for multiple log drivers, just like log4j does.

More elaborate description is here: https://github.com/rsyslog/liblogging

 

As the lib is becoming ready for prime time, I would really appreciate if some folks could have a look at the code and check for problems and/or offer suggestions in regard to the API.

It is only the code inside ./stdlog (roughly 1400 lines of code, including header files, empty lines and comments): https://github.com/rsyslog/liblogging/tree/master/stdlog

The man page is available here: https://github.com/rsyslog/liblogging/blob/master/stdlog/stdlog.rst

All feedback is very welcome!

Thanks,

Rainer

CEE-enhanced syslog defined

CEE-enhanced syslog is an upcoming standard for expressing structured data inside syslog messages. It is a cross-platform effort that aims at making log analysis (and log processing in general) much more easy both for log producers and consumers. The idea was originally born as part of MITRE’s CEE effort. It has been adopted by a larger set of logging stakeholders in an initiative that was named “project lumberjack“. Under this project, cee-enhanced syslog, and a framework to make full use of it, is being openly advanced. It is hoped (and planned) that the outcome will flow back to the CEE standard.

In a nutshell cee-enhanced syslog is very simple and powerful: inside the syslog message, a special cookie (“@cee:”) is followed by a JSON representation of the data. The cookie tells processors that the format is actually cee-enhanced. If you are interested in a more technical coverage, have a look at my cee-enhanced syslog howto presentation.

Adiscon is one of the main supporters of project lumberjack and CEE enhanced syslog. Since February 2012, Adiscon products offer basic support for cee-enhanced syslog, being among the first tools to do so.

Announcing Project Lumberjack

Two weeks ago, along with the Fedora Developer’s Conference in Brno, Czech Republic, a couple of logging and auditing folks from Red Hat, Balabit (syslog-ng), the MITRE Corporation, and Adiscon (me) stuck their heads together to talk about the future of structured logging. It quickly became clear that extending syslog in the CEE spirit is the right thing to do.

We observed that almost all technology is present to provide a rich framework to support structured logging. Actually, both syslog-ng and rsyslog provide the necessary plubming since long (for, example, as part of the RFC5424 effort), but that functionality is relatively seldom explored actively by other developers. A core problem in that regard is that most applications rely on the good old syslog() API, which does not provide structured logging by itself. Also, there is no common log storage database available, which tools could be based on.

In order to evolve syslog, we defined a three-layer architecture, with applications and logging libraries/APIs being the top layer, the syslogd the middle layer and the datastore the bottom layer. Multiple APIs must be supported as noone can expect projects to change their existing logging infrastructure. Also, existing frameworks like log4j or log4j and even glibc’s syslog() will stay around for a while longer. New libraries (like ELAPI) will  probably become more dominant for new applications. So how to tie these different libraries to the syslogd subsystem (the second layer)?

The solution is rather simple: we use what we already achieved in CEE and support cee-enhanced syslog on the system log socket. The core idea is very simple: we use the regular syslog message part, but include JSON-encoded structured data with it. To signify to the syslog system that this is actually cee-enhanced, a cookie string (“@cee:”) is used in front of the JSON data. It is then easy to decide for the syslogd which message format it deals with: if the cookie is present and the rest of the message is a valid JSON representation, the message is cee-enhanced. If one of the two conditions fails, it is traditional syslog. As both conditions are checked together, it is highly unlikely that a legacy syslog message will ever fit into that criteria (and if it really does, nothing is lost: after all, the syslogd has correctly understood that format). It must be noted that the necessary parsing and internal plumbin is available both in syslog-ng as well as rsyslog (I committed the missing JSON parser, held back awaiting a more final CEE standard, yesterday).

The interface to the log database layer is currently not as well defined and needs to be worked on. Note that both syslog-ng and rsyslog support multiple datastores, so there already exist solutions. The group as whole was of the opinion that some unified API for a log data store would be useful and something that should be looked at as a longer-term target.

After reaching this rough consensus, we were delighted to see that most of the base technology is already and place and just needs to be tied correctly together. It is more an effort of doing detail implementations and documenting the various pieces (and how they work exactly together) than creating a totally new system (aka “can be quickly done”). We agreed that it probably is best to reach for the low-hanging fruit first: get structured logging integrated first, then do the other steps. So an initial milestone will be making sure cee-enhanced syslog is supported by all of the subsystem and only after this is done reach for the other things.

One of these next things definitely is a dictionary of field names (and exact structure) to be used to describe events in a standard way (for example a logon event). While the whole effort is highly inspired by CEE, it probably is best to try out initial efforts outside of the formal CEE framework. That will enable rapid development, discussion and the capability to check what works in practice. The experience gained in such PoC can than be feed back to the formal CEE process (along the old IETF mantra “running code and rough consensus first”).

We agreed that such an effort is best be done in a tranparent and flexible open source process. With that, project lumberjack was born: an effort to provide better structured logging for Linux, being supported by many major players in that arena. We agreed that it would be a good idea if Red Hat provided some of the project infrastructure. This is why you find project lumberjack now at fedorahosted.org (note that the project will probably contain mostly specs and less code, which is kept in the individual project’s repositories).

Serious syslog problems?

In the paper introducing journald/Linux Journal a number of shortcommings in current syslog practice are mentioned. The authors say:

Syslog has been around for ~30 years, due to its simplicity and ubiquitousness it is an invaluable tool for administrators. However, the number of limitations are substantial, and over time they have started to be serious problems:

I have now taken some time to look at each of these claims in depth. But before I start, I need to tell that I am working in the IT logging field for nearly 15 years, have participated in a number of standards efforts and written a lot of syslog-related software with rsyslog being a prime example (some commercial tools I have been involved with can be found here). So probably I have a bias and my words need to be taken with a grain of salt. On the other hand, the journald authors also have a bias, so I guess that’s a fair exchange of arguments ;). 

In my analysis, I compare the journald effort with what rsyslog currently provides and leave closed source software out. It is also important to note that there is a difference between syslog, the protocol, a specific syslog application (like rsyslog) and a system log message store. Due to tradition, these terms are often used for different things and one must deduce from context, what is meant. The paper applies the same sloppiness in regard to terms. I use best effort to extract the proper meaning. I quote the arguments as they originally appeared inside the paper. However, I rearrange them a bit in order to put related things closer together. I retain the original numbering so that you can compare to the original paper. I also tried to be similar brief with my arguments. Now proof-reading the post, I see that I failed with that. Sorry, but that’s as brief as I can provide serious counterargument. I broadly try to classify arguments in various levels of “True” vs “Wrong”, so you may take this as an ultra-short reply. 

So let’s start with Arguments related to the log storage system. In general, the paper is right that there is no real log storage system (like, for example, the Windows Event Log). Keeping logs only in sequential text files definitely has disadvantages. Syslog implementations like rsyslog or syslog-ng have somewhat addressed this by providing the ability to use databases as storage backends (the commercial syslog-ng fork also has a proprietary log store). This has some drawbacks as well. The paper proposes a new proprietary indexed syslog message store. I kind of like this idea, have even considered to write something like this as an optional component for rsyslog (but had no time yet to actually work on it). I am not convinced, though, that all systems necessarily need such a syslog storage subsystem.

With that said, now let’s look at the individual arguments:

5. Reading log files is simple but very inefficient. Many key log operations have a complexity of O(n). Indexing is generally not available.

True. It just needs to be said that many tools inside the tool chain only need sequential access. But those that need random access have to pay a big price. Please note, however, that it is often only necessary to “tail” log files, that is act on the latest log entries. This can be done rather quickly even with text files. I know both the problems and the capabilities, because Adiscon LogAnalyzer, in which I am involved, is a web-based analysis and reporting tool capable of working on log files. Paging is simple, but searching is slow with large files (we recommend databases if that is often required). Now that I write that, a funny fact is that one of the more important reasons for creating rsyslog was that we were unhappy with flat text files (see rsyslog history doc). And so I created a syslogd capable of writing to databases. Things seem to be a bit cyclic, though with a different spin ;)

8. Access control is non-existent. Unless manually scripted by the administrator a user either gets full access to the log files, or no access at all.

Mostly True and hard to make any argument against this (except, of course, if you consider database back ends as log stores, but that’s not the typical case).

10. Automatic rotation of log files is available, but less than ideal in most implementations: instead of watching disk usage continuously to enforce disk usage limits rotation is only attempted in fixed time intervals, thus leaving the door open to many DoS attacks.

Partly True, at least in regard to current practice. Rsyslog, for example, can limit file sizes as they are written (“outchannel action”), but this feature is seldomly used and due to be replaced by a better one. The better one is partly implemented but received no priority because nobody in the community flagged this as an urgent requirement. As a side-note: Envision that journald intends to shrink the log and/or place stricter restrictions on rate-limiting when disk space begins to run low. If I were an attacker, I would simply begin to fill the disk then, and make journald swipe out the log store for me.

11. Rate limiting is available in some implementations, however, generally does not take the disk usage or service assignment into account, which is highly advisable.

It needs to be said what “rate limiting” means. I guess it means preventing an application from spamming the logs with frequently repeated messages. This feature is available  in rsyslog. It is right that disk usage is not taken into account (see comment above on implications). I don’t know what “service assignment” means in this context, so I don’t comment on that one. Rate limiting is more than run-away or spamming processes. It is a very complex issue. Rsyslog has output rate limiting as well, and much more is thinkable. But correct, current rate limiting looks at a number of factors but not the disk assignment. On the other hand, does that make sense, if e.g. a message is not even destined to go to the disk?

12. Compression in the log structure on disk is generally available but usually only as effect of rotation and has a negative effect on the already bad complexity behaviour of many key log operations.

Partly True. Rsyslog supports writing in zip format for at least one and a half year (I am too lazy to check the ChangeLog). This provides huge savings for those that turn on the feature. Without doubt, logs compressed in this way are much harder to process in real-time.

7. Log files are easily manipulable by attackers, providing easy ways to hide attack information from the administrator

Misleadingly True. If thinking of a local machine, only, this is true. However, all security best practices tell that it is far from a good idea to save logs on a machine that is publicly accessible. This is the reason that log messages are usually immediately sent do some back end system. It is right that this can not happen in some setup, especially very small ones.

My conclusion on the log store: there definitely is room for improvement. But why not improve it within the existing frameworks? Among others, this would have the advantage that existing methods could be used to decide what needs to be stored inside the log store. Usually, log contain noise events that administrators do not want to log at all, because of the overhead associated with them. The exists best practices for the existing tool chain on how to handle that.

Now on to the other detail topics:

1. The message data is generally not authenticated, every local process can claim to be Apache under PID 4711, and syslog will believe that and store it on disk.

9. The meta data stored for log entries is limited, and lacking key bits of information, such as service name, audit session or monotonic timestamps.

Mostly wrong. IMHO, both make up a single argument. At the suggestion of Lennart Poettering, rsyslog can force the pid inside the TAG to match the pid of the log message emitter – for quite a while now. It is also easy to add additional “trusted properties”. I made an experimental implementation in rsyslog yesterday. It took a couple of hours and the code is available as part of rsyslog 5.9.4. As a side-note, the level of “trust” one wants to have in such properties needs to be defined – for truly trusted trusted properties some serious cryptography is needed (this is not specified in the journald proposal nor currently implemented in rsyslog).

2. The data logged is very free-form. Automated log-analyzers need to parse human language strings to a) identify message types, and b) parse parameters from them. This results in regex horrors, and a steady need to play catch-up with upstream developers who might tweak the human language log strings in new versions of their software. Effectively, in a away, in order not to break user-applied regular expressions all log messages become ABI of the software generating them, which is usually not intended by the developer.

Trivial (I can’t commit myself to a “True” or “Wrong” on such a trivial finding). Finally, the authors have managed to describe the log analysis problem as we currently face it. This is not at all a syslog problem, it is problem of development discipline. For one, syslog has “solved” this issue with RFC5424 structured data. Use it and be happy (but, granted, the syslog() API currently is a bit problematic). The real problem is the missing discipline. Take, for example, the Windows Event Log. The journald proposal borrows heavily on its concepts. In Windows Event Log, there is a developer-assigned unique ID within the application’s reserved namespace available. The combination of both app namespace (also automatically created) and ID together does exactly the same thing as the proposed UUID. In Windows Event Log, there are also “structured fields” available, but in the form of an array (this is a bit different from name-value pairs but far from totally different). This system has been in place since the earliest versions of Windows NT, more than 15 years ago. So it would be a decent assumption that the problem described as a syslog problem does not exist in the Windows world, right (especially given the fact that Windows purposefully does not support syslog)? Well, have a look at the problems related to Windows log analysis: these are exactly the same! I could also offer a myriad of other samples, like WELF, Apache Log Format, … The bottom line is that developer discipline is not easy to achieve. And, among others, a taxonomy is actually needed to extract semantic meaning from the logged event. It probably is educating to read the FAQ for CEE, a standard currently in development that tries to somewhat solve the logging mess (wait a moment: before saying that CEE is a bunch of clueless morons, please have a look at the CEE Board Members first).

3. The timestamps generally do not carry timezone information, even though some newer specifications define support for it.

Partly Wrong. High-Precision timestamps are available for many years and default in rsyslog. Unfortunately, many distros have turned them off, because they break existing tools.  So in current practice this is a problem, but it could be solved by deleting one line in rsyslog.conf. And remember that if that causes trouble to some “vital” tool, journald will break that tool even more. Note that some distros, like Gentoo, already have enabled high precision timestamps.

4. Syslog is only one of many log systems on local machines. Separate logs are kept for utmp/wtmp, lastlog, audit, kernel logs, firmware logs, and a multitude of application-specific log formats. This is not only unnecessarily complex, but also hides the relation between the log entries in the various subsystems.

Rhetorically True – but what why is that the failure of syslog? In fact, this problem would not exist if developers had consistently used syslog. So the problem is not rooted in syslog but rather in the fact that syslog is not being used. Lesson learned: even if standards exist, many developers simply ignore them (this is also an interesting argument in regard to problem number #2, think about it…).

13. Classic Syslog traditionally is not useful to handle early boot or late shutdown logging, even though recent improvements (for example in systemd) made this work.

True – including that fact that systemd already solved that problem.

14. Binary data cannot be logged, which in some cases is essential (Examples: ATA SMART blobs or SCSI sense data, firmware dumps).

Wrong, the short answer is: it can be logged, but must be properly encoded. In the IETF syslog working group we even increased the max message sizes for this reason (actually, there is no hard limit anymore).

The longer, and more correct, answer is that this is a long-standing discussion inside the logging world. Using that view, it is hard to say if the claim is true or false; it often even is argued like being a religion. Fact is that the current logging toolset does not work well for binary data (even encoded). This is even the case for the Windows Event Log, which supports binary data. In my view, I think most logging experts lean towards the side that binary data should be avoided and, if unavoidable, must be encoded in a text-friendly way. A core problem with the usefulness of binary data is that it often is hard to decode, and even more to understand, on the non-native platform (remember that the system used during analysis is often not the system where the event was initially recorded).

6. The syslog network protocol is very simple, but also very limited. Since it generally supports only a push transfer model, and does not employ store-and-forward, problems such as Thundering Herd or packet loss severely hamper its use.

Wrong,  missing all improvement made in the past ten years. There is a new RFC series which supports TLS-secured reliable transmission of syslog messages and which permits to place fine-grain access control on who can talk with whom inside a relay chain. UDP syslog is still available and is so for good reason. I cannot dig into the details here, part of that reasoning is on the same grounds why we use audio more often over UDP than TCP. Using UDP syslog is strictly optional and there are few scenarios where it is actually needed. And, if so, the “problem” mentioned is actually a “solution” to a much more serious problem not even mentioned in the journald paper. For a glimpse at these problems, have a lock at my blog post on the “reliability problem”. Also, store-and-foward is generally available in rsyslog via action queues, failover handling and a lot of other things. I admit that setting up a complex logging system, sometimes requires an expert. On the “loss issue”, one may claim that I myself say that plain TCP syslog is not totally lossless. That claim is right, but the potential loss Window is relatively small. Also, one can use different protocols that solve the issue. In rsyslog, I introduced proprietary RELP for that very reason. There is also completely lossless RFC3195, which is a great protocol (but without future). It is supported by rsyslog, but only extremely few other projects implement it. The IETF (including me) assumes that RFC3195 is a failure – not from technical fact but in the sense that it was too far away from the usual logging practice to be picked up by enough folks. [Just to avoid mis-intepretation: contrary to RFC3195, RELP is well alive, well-accepted and in widespread use. It is only RFC3195 that is a failure.]

Concluding my remarks, I do not see anything so broken in syslog that it can only be fixed by a total replacement of technology. Right contrary, there is a rich tool set and expertise available. Existing solutions, especially in the open source world, are quite modular and can easily be extended. It is not even necessary to extend existing projects. A new log store, for example, could also be implemented by a new tool that imports a decent log format from stdin to a back end data store. That would be easily usable not only from rsyslog but from any other tool that is part of the current log tool chain. For example, it may immediately consume Apache or other application logs (of course, such a tool would require proper cryptography to be used for cryptographic tasks…). There is also need for a new logging API – the catch-all syslog() call is clearly insufficient (an interesting detail fact is that journald promises to retain syslog() as a first-class logging interface — that means journald can solve none of the issues associated with that API, especially in regard to claim #2).

So extending existing applications, or writing new ones that tightly integrate into the existing toolset is the right thing to do. One can view journald as such an extension. However, this extension is somewhat problematic as its design document tells that it intends to replace the whole logging system. Especially disturbing is that the reasoning, as outlined above, essentially boils down to a new log store and various well-known mostly political problems (with development discipline for structured formats right at the top of them). Finally, the proposal claims to provide more security, but fails to achieve at least the level that RFC5848 syslog is able to provide. Granted, rsyslog, for example, does not (yet) implement RFC5848. But why intends journald to implement some home-grown pseudo security system when a standard-based method designed by real crypto experts is available? I guess the same question can be applied to the reasoning for the journald project at large.

Let me conclude this posting with the same quote I started with:

Syslog has been around for ~30 years, due to its simplicity and ubiquitousness it is an invaluable tool for administrators. However, the number of limitations are substantial, and over time they have started to be serious problems:

Mostly Wrong. But it is true that syslog is an invaluable tool,especially in heterogeneous environments.

Trusted Properties in rsyslog

Today, I implemented “trusted (syslog) properties” inside rsyslog’s imuxsock module. The term “trusted” refers to the fact that these properties can not be faked by the logging application, creating an additional layer of log integrity protection. The idea is rooted in the journald proposal, where they are called “metadata” and “trusted fields”. Actually I liked the idea implied by “trusted”, but thought “property” would be a better name than “field”.

The concept is not totally new. Actually, for some month rsyslog can patch the PID field of the syslog TAG with the correct pid, so that this cannot be mangled with. This was based on an idea from Lennart Poettering, which I found nice and implemented quickly (I met him at Linux Tag 2010 in Nürnberg, Germany where we discussed this and other things). The core idea is to use SCM_CREDENTIALS so that the OS itself records pid, gid and uid. With the new feature, this is taken one step further. Now, we also query the /proc virtual file system for additional information like the location of the logging application’s binary. Undoubtedly, this provides some extra protection against faked messages. On the downside, it has some obvious overhead. A simple and immediate solution to this is to use rsyslog’s omfile in zip mode. In journald, overhead is tried to avoid via a proprietary binary format, its event log, which provides compression features (but for syslog transmission the journald event log obviously needs to be decompressed as well). Some restrictions exist with trusted properties, some obvious, some less obvious (see the trusted property doc for details; it also has the list of currently supported properties).

The current implementation is in experimental status. Based on feedback, some specifics may be changed in future versions. Also, the current implementation does not try to be standards-compliant. This will probably also change in the future. I hope that the new capability is useful to the logging community. As a side-note, the new feature, implemented in one morning, also shows that it often is easy to extend existing technology instead of writing everything new from scratch ;)

The actual release announcement will go out either today or tomorrow. The code is available via the v5-devel git branch right now.

funding rsyslog development

To be honest, funding the rsyslog project is not easy these days. It never was, but has seen an extra hit by the current economic crisis. Rsyslog, in its initial phase, has been sponsored exclusively by Adiscon as part of its open source involvement. In 2007, we added rsyslog professional services with things like support contracts or custom development. While some customers used these services, Adiscon was still required to sponsor the project and is so until now. Unfortunately, professional services are not doing extremely well (to phrase it politely) and the global crisis is having a hit on Adiscon’s customers. As a consequence, I have been more involved with paid work during the past weeks and could not work as much on rsyslog as I had liked to. The shift in Linux logging that probably will be brought by journald (read blog posting) doesn’t strengthen my position inside Adiscon either and works as an accelerator for change…

We have been discussing for quite some while how to improve this situation. While I don’t like the idea, we probably need to think about a dual licensing approach for rsyslog. Please keep reading, you can be upset when I have made the rest of my argument ;-). First of all, I really don’t like dual-licensing. In fact, syslog-ng’s dual licensing approach was one reason that made me start working on rsyslog (blog post). I also know that rsyslog’s simple GPL license was one of the major “buying points” that made rsyslog become the default syslogd on Fedora and later many other distributions. In order to permit reuse of rsyslog technology in some other tools, in 2008 we created a licensing model that puts the so-called runtime – a large part of rsyslog – under LGPL (see “licensing rsyslog” and a previous blog post outlining the change). Syslog-ng later cloned this licensing model, but it seems like they put a couple of more things under LGPL than we did (so there seem to be rather weak “product driver” with most of the “real meat” being under LGPL – in rsyslog larger parts are GPL, only). There is an interesting article on lwn.net that tells about this development, and does so from a syslog-ng point of view. The most interesting fact I got from this article was that syslog-ng faced quite the same problems we have with rsyslog — and could not solve them without a commercial fork. Bare other options, it looks like this is a path that rsyslog needs to go, too. If so, of course this needs to be done as careful as possible.

After dual-licensing finally surfaced as something hard to avoid yesterday evening, I have done git log review today. I have to admit it was a bit scary: we have had some excellent and larger code contributions by Fedora folks in rsyslog’s infancy (and continuous support since them), we have had some larger chunks of code in form of modules contributed and there is Michael Biebl, who not only creates great Debian packages but always helps with autotools and smoothing some edges. Finally, we have a couple of folks who sent in very specific patches. But I have to admit that the very vast majority of code was written by myself ;) As of today, we have 2819 git commits. Out of them 2676 were made by me (and another 50 or so by other Adiscon folks). These number need to be taken with a grain of salt: rsyslog was initially kept in a CVS archive, and all contributions at that time were logged with my user account. The early Fedora patches were in that timeframe. That have been around 20 or so. Also, my commit count is a bit higher due to automatic merges. On the other hand, the difference in code lines is probably even a bit higher than the difference in commit count. I have not done any in-depth analysis, bu an educated guess is that more than 98% of code lines were written by me (after all, I have worked a couple of years on this project…).

I am now tasked with actually looking at the code. I will try to differentiate addon user contributions (like omoracle) from core files. This is useful anyway, because it makes clearer to users what is directly supported by the project and what not. Then, I will probably look into contributions and see which code remains at which locations. After that is know, I need to have another set of talks with my peers at Adiscon (and probably the top contributors) and see where we can head from here.


This is, honestly, how the state of affairs in regard to the rsyslog project currently is. Most probably we need to move to some commercial licensing model. I know this is not ideal. I know many of you will not really like it. On the other hand, it is plain fact that many for-profit organizations greatly benefit from rsyslog without ever contributing anything. While they can continue to do so, it is probably a good idea to help them find an offering that funds the project. As final remark for today, let me introduce you to a blog post that IMHO very nicely describes the problems, and needs, around dual licensing. I am not affiliated with the author, do not even know him.

I hope that the ideas described here will enable us to keep pushing forward with rsyslog technology, something I would really like to do!

a new rsyslog v5 beta – and focus on v6!

I have just generated a new v5-beta. It is being processed for release right now and will be soon. From the announcement:

This release both offers a set of new features and, at the same time, turns 5.7.3 into beta state. At first, this sounds a bit contradictory, but we do this for two reasons:

a) the new features introduced are non-intrusive in regard to the existing feature set, so no “bad surprises” are expected
b) other than that, primarily bugfixing went into this release, with
only few remaining issues being open

In order to move towards a new v5-stable, we consider it useful to begin with a new v5-beta stage. Note that this time the beta phase may be rather quick, because of the argument b) above. Note that we did not have any serious bug reports (except for one open issue) since December. Once the remaining issue is solved, we plan to do a short “proof in practice” and then move on to a new stable version.

In regard to new features, this release offers imfile multi-line capability, realtime UDP reception capability and better configurability for ommysql as recently announced for 6.1.3.

With this release, I also plan to conclude moving new features into v5 and try to focus on v6. This may not be practical in all cases, but I will try to stick to this plan as much as possible. During the past few weeks I have had considerable work to do just to integrate the various new features introduced in v4 and v5 into v6. While this was not really hard to do, it requires more than a little effort and very careful handling of the changes. The primary reason is that the code base diverged quite a bit and merging isn’t so much “fun” with that. More than once I even screwed up on some minor details. I hope that with a focus on v6 (for new features), I can spare a lot of time which than can go into new features.

And please do not misunderstand me: I focus on v6 for new features. This means I can focus even more on v5 in regard to correctness (bug-freeness). As usual, I prefer to fix issue in the oldest affected (and supported) release, and traditionally the version before the most current version branch has been very attractive to users because it has a near-complete feature set and a very strong focus on correctness.