supported rsyslog versions

Every now and then I am asked which versions the rsyslog project officially supports. I track the current version on the rsyslog status page. To me as a surprise, I was recently asked what that really means. And I have to admit that the page actually does not spell this out in detail. So I thought it may be a good idea to write some small comments on this.


First, what does “officially supported” mean? I think the best definition is that this is the software we expect users to run and we will most probably not look into problems in older versions. At least I will usually not craft patches for older versions. Note that things are a bit different for customers with rsyslog support contracts: here Adiscon as rsyslog’s prime sponsor provides support for all stable versions ever created (as long as this is technically possible, e.g., there are some things which simply can not be fixed in very old versions).


So which versions are officially supported? It is first important to note that for each major version (v4, v5, …) there exist three branches: stable, beta and development. In practice, there usually exists only a beta and development version for the most recent major version and maybe the one before (but we try to avoid that). All other active versions are stable. An older but still useful plot of this concept can be seen in my blog post on the rsyslog family tree. The idea is that new functionality goes into the devel branch, is then slowly migrated to beta and moved to stable state once it gets sufficiently mature (read my blog post on “How Software get Stable” for more on this philosophy).


It is important to note, however, that only the current versions of each branch are officially supported (but note the extension for support contracts I stated above). Today exist far more than 100 versions of rsyslog. It is absolutely impossible to maintain all of them. So we focus on what we expect a careful user should run. For the ultra-conservative folks, we keep older versions, like v4, inside the supported range, but we can (and will) not support any interim version. For example, the some early 4.6.x versions were bugged by some really nasty problems in the TLS arena. This has been fixed in later versions, so why should we still support the known-to-be-faulty versions? This sounds like a big waste of time. Note the difference between, e.g. 4.6.0 and 4.6.5 is just bug fixes (!). So if you follow the “How Software get Stable” spirit, there is absolutely no point in running software with more bugs than necessary and so it sounds like a really bad idea to run those versions. We understand that corporate guys may still have valid reasons to use outdated versions, but that’s covered by the support contracts (think about it this way: this generates a lot of technically unnecessary work, so someone needs to pay for that…).

The story is of course slightly different for version shifts from e.g. 4.4.x to 4.6.x. Note that even number denote stable versions (odd ones are either devel or beta, depending on how they are declared). Here, some new functionality is introduced, and thus there is additional risk for regressions. We try to mitigate this potential by our beta phase, which usually lasts three month. In my opinion, a beta is very close to a stable version. So for this time period, we support something like two stables in order to make sure users can migrate to the new version without risk. Of course, this doesn’t outrule the chance that some regression is left uncaught. But, some risk is always involved in life…

Note that we do not generally provide patches for older stable versions once the next one is out. Even when we patch older versions for support contract customers, we usually patch exactly the problem they see, and not more (what usually is exactly within their minimal-change strategy). So to re-use our 4.x TLS example: the nasty TLS bug was fixed in 4.6.5, which was declared the current stable. So the patch is not available in 4.6.0 to 4.6.4. This also means that no 4.2.x or 4.4.x version will ever get it, because these versions were quite outdated (and unsupported) when the problem was fixed.

If you are still not convinced, just think if this: what would happen if I applied all patches to 4.6.0 that went into 4.6.5? Yes, exactly — it would be the exact same code. This is because 4.6.5 is version 4.6.0 plus all patches ever created for that version ;)

To sum up in one sentence: the project officially supports the head versions of all supported branches (as of the rsyslog status page).

rsyslog config reload – random thoughts

This blog post is more or less a think tank, maybe even an utility to clear my mind. Please note that I am not talking about anything that is right now present in rsyslog. I am not even saying that it will be in the near future. But I’d like to think a bit about alternatives on the route to there.

Let’s assume rsyslog shall have two abilities:

  • use different config languages
  • dynamically reload a config without a full restart (thus applying a delta between new and old config)

In any case, the usual approach is to have an object representing a full configuration. This is an in-memory object. Usually, it is created during parsing the configuration file(s). During that parsing, nothing of the new config is actually carried out, just the in-memory presentation build. In that model, it would also be possible to have several fully populated config objects in core at the same time. The important thing to note is that none of them actually affects the current system – they are just loaded and ready to use.

Usually, in such a design, there is one thing that is called the “running conf”. This configuration is the one the system actually uses for processing.

So how is a new configuration activated? In a first step, the config parsers create an in-memory object. Once this is done, that object is a candidate config, one that could be activated. To actually activate it, the candidate config is loaded as running config. During that process, all the settings are applied and services are started. Please note that a dynamic config reload can be done by first creating a delta between the candidate config and the current running config. This delta can then be used to keep the currently existing config running, but just modify it so that it is equivalent to the candidate config. This process can be less intrusive than shutting down the running config and restarting it based on the candidate config. As an example, a rsyslog system may have several hundered incoming TCP connections open. If the delta is just the addition of a new output file, there is no need to shut down these TCP connections as part of the (delta-driven) candidate config activation. Whereas, if no delta were to be used, all connections would need to be shut down and re-established after restart. There is obviously benefit in delta-based configuration apply. However, it should be noted that there are many subtle issues associated with creating and applying the delta.

Thinking about such a loaded/candidate/running config system for rsyslog, there is one overall issue that complicates things: loadable modules! Each module not only provides extra functionality, it also provides a set of configuration settings to modify its functionality. As such, we have a problem with config parsing: in order to fully parse a config file, we actually need to load the module as part of config processing. Or more precisely, we need to load its configuration file processor. However, it does not make sense to split each module into a config file processor and the actual module, at least I think so. Splitting them up would make things over-complex IMHO. However, we must demand that a module, in its config processing code, must not do anything other then creating configuration objects. Most importantly, it must not start any service or act out any non-config related activity. If it would do so, it would possibly affect the current running configuration. Also, config processing does not necessarily mean that the config will actually be activated, so the module must not assume that its processing will ever be called. As a side-note, this is one of the issues with the current legacy configuration system (as seen in all versions prior to v6 and early v6 versions).

Rsyslog traditionally keeps a list of loaded modules. Modules are added to that list when they are loaded inside the configuration system. During system startup, that very same list is also used to activate services inside the module. So that single list serves both as

  • a registry of loaded modules (e.g. to know what is already available and what needs to be unloaded)
  • a registry of modules required to for configuration

Both functions are tied together into a single list because rsyslog currently has only the concept of a single configuration, and not a candidate/running (multi-)config system. For the latter, it is necessary to differentiate between the two cases:

As we need to load modules during config parsing, we still need a single global list that keeps track of modules already present inside the system. Please note that with a multi-config system, a module that is first “loaded” inside a currently being parsed configuration file may actually already be loaded inside the system. In that case, a duplicate load must be avoided and the already existing module be used. The global, config-independent list is required to support this functionality.

On the other hand, such a global list can no longer be used to activate services for a specific config. This is easy to see when we have a config A which uses e.g. a TCP listener and we have a config B which does not. If B is activated, the TCP listener shall obviously not be activated. As such we need a dedicated, config-specific list of modules that are part of the current configuration. Let’s call this one the “config module list” and the other one the “loaded module list”.

The loaded module list is than just use to keep track of which modules are loaded. Also, it will/can be used to locate a module, so that global functions (like the config parser) can be found and carried out. Note that I call the config parser global, not config-specific. The reason is that the config parser does not take a config instance as input, but rather has no input (other than the config language) but has the config instance as output. As such, it emits config specific data, but does not require it for processing. So it is global.

The config module list in contrast must hold all config specific data elements for the module (most importantly the module specific config instance itself). The config module list is to be used for all config-specific actions. For example, it will be used to activate a module’s services when the candidate config becomes a running config (maybe via a delta-apply process).

Note that on-demand module unload can be done via reference counting, which is already implemented in rsyslog. When a module is put onto a config module list, the count is incremented. If it is removed from such a list (usually because the in-memory config is destroyed), the count is decremented. An unload happens if the reference count reaches zero. If the module would be required by later processing another config, that would trigger a reload just as if the module had never before been loaded.

Note that a clearly defined and implemented split in global vs. configuration specific functionality is of vital importance for a multi-config system. This probably has some subtle issues as well. Right out of my head, I can think of the problem of some potentially global configuration settings (a term that seems to contradict itself in this context – just think a bit about it…). For example, we have the module search path, which tells us from where to load modules. With different configs, we can potentially have different module search pathes. That, in turn, can load to modules with the same name being loaded from different locations. That means we could potentially have different functionality, including different sets of config parameters (!) in the system at the same time. This could lead to some hard to diagnose issues. So it looks necessary to have the ability to load the same module via different pathes concurrently, and apply only the “right” module to the config in question. Looking at the current code base, implementing this would be even harder than just splitting out the global/config-specific lists. Maybe this would be something that shall be added at a later stage, *if* at all we take the path down that road. Also, there may be other issues along the way that I do not currently envision…

What is rsyslog auto-backgrounding?

Rsyslog, by default, auto-backgrounds itself after startup. That simply means that the instance that is started by the user (or script) more or less does nothing but fork a new instance detached from the current terminal session and execute it. The originally started instance exits after a short timeout. This behavior was carried over from sysklogd.

Note that auto-backgrounding is problematic (aka “makes things more complicated than the need to be”) in debug sessions, lab environments and so on. So command line switch “-n” can be used to turn off auto-backgrounding. In that case, the first instance started will actually carried out the work to be done (as most would expect in the first place).

It is strongly recommended to use “-n” option for lab testing.

why does the rsyslog testbench sometimes fail?

Rsyslog contains a set of automatted tests, the so-called “testbench”. It is invoked via the standard method of “make check” and “make distcheck”. Since its introduction in version 3, the testbench has been continously enhanced and extended. It now contains around 150 individual tests, which sum up to around 80 tests from the autoconf point of view (some autoconf tests run a couple of subtests, thus the difference in number). The testbench has been proven to be very useful and caught numerous problems before new code was released.

But the testbench is not perfect, and it may sometimes fail without any actual problem. There are two reasons for this. One is that the test require a very specific environment. For example, some parser based tests assume that the system the test is run on is configured to be named “localhost.localdomain” (the default for many test deployments). This needs to be the case because there currently is no way in rsyslog to overwrite the local hostname. Some parser tests use malformed messages, in which case (as of the RFC), the local system name must be used. As such, we need to have a specific system name set in order to prove the results. In the long term, I’ll add the capability to overwrite system name inside rsyslog, but it does not make sense to create a dirty trick just for testbench use. So this needs to wait until we get to it as part of regular development. Note that a similar issues may exist at other places. An obvious one is the database tests, where we need pre-created users, databases, tables etc in order to run the tests.

The other issue is a bit more subtle. The syslog protocol is simply, without App-Layer acknowledgments. This makes it hard to know when rsyslog has received a while bunch of test data. That in turn makes it hard to definitely say when all test data has arrived and an instance can be shut down. So the whole process is a bit racy. To “solve” this, I use some wait periods in tests affected by this problem. However, longer wait periods mean longer test bench runtime and this reduces my development productivity. So I use wait time that usually does the job, but may fail under some circumstances (most notably when –enable-debug is set). This can affect a couple of TCP-based tests (like imtcp_conndrop.sh and similar ones). I have not yet a good idea what a clean solution to this problem is, where “clean” means that it a) always works and b) does no introduce unnecessary code complexity under non-testbench runs.

Given these problems, some care must be taken interpreting testbench results. Most importantly, a fail does not necessarily mean that things are actually broken. It merely means that one needs to look at the actual test and check a) why it fails and b) if it fails repeatedly. Especially the “racy” test tend to occasionally fail without any real problem. I’ve also seen them to fail consistenly on some platforms, simply because my timing assumptions are not valid there (Solaris was one example where I needed to adjust my overall wait periods).

So testbench results need to be taken with a grain of salt, and require interpretation. I know this is inconvenient for occasional users, but it is the best compromise I currently can offer.

using failover and asynchronous actions in rsyslog

I wanted to point out that failover actions and asynchronous processing does not work well in rsyslog — at least if a simple approach is used. The reason is that there is a conceptual problem with them working both together: async actions will – by design – always return an “everything went OK” status (because we don’t know otherwise, as things are async!). So this means failover processing will never see an error.

Now look the following config:

$modload imuxsock
$ActionQueueType LinkedList
$RepeatedMsgReduction on
*.* @@10.48.20.19:10514
$ActionExecOnlyWhenPreviousIsSuspended on
& @@10.48.20.18:10514
& -/home/logfile2
$ActionExecOnlyWhenPreviousIsSuspended off

If the Action Queue Type is set to linked list (and thus the action executed asynchronously), the other two actions will never be executed – because the async action always “succeeds”.

There are two ways to solve this situation:
1) do run the action synchronously — depending on your needs, this may be a solution or not
2) if you need to run it async, you need to define a new ruleset, which includes the config WITHOUT the async processing. Then, use omruleset to execute the newly defined ruleset as whole asynchronously. This is obviously a bit more complex, but will do what you need.

I hope this resolves some confusion about the failover functionality (and, yes, a better config language would make this less painful — hopefully we will finally be able to write one ;)).

new rsyslog/systemd work going on

This is just a quick note that systemd is doing really well in providing logging right from the system startup, even when no syslogd is running. The magic is that messages are put into the kernel log, where the (later started) syslogd can pull them from. Unfortunately, there are some downsides from this mode, and we are currently working to solve them. For more details, please follow this thread:

http://lists.freedesktop.org/archives/systemd-devel/2011-March/001558.html

I guess the remaining issues will be settled soon, as almost everything is in place and I just need to add some additional parsing logic to rsyslog. Due to the conference next week, I may need one extra week to complete that. All of this work will be part of the newly opened v5-devel (and above).

we are nearing a new rsyslog v5-stable

I have just released rsyslog v5.7.9. It will possibly be the last v5-beta versions of the 5.7. branch. I’ve ironed out a lot of bugs during the past two to three weeks. Right now, some patches are in 5.7.9 and not in the current stable, because I wait for some more feedback on the patches.There are still some bugs open in bugzilla, but all of this bugs are mostly concerned with rather exotic environments AND are present in the current v5-stable as well. So there is little argument to hold the new v5-stable branch just for that reason.

The plan is to release a last 5.6.6 version, ending that branch. Shortly after that, I’ll release 5.8.0. That way, conservative operators receive the latest round of bug fixes and can probably wait quite relaxed until 5.8.1 arrives ;)

It should be noted that 5.8.0 will be the first stable version with full support for systemd.

a new rsyslog v5 beta – and focus on v6!

I have just generated a new v5-beta. It is being processed for release right now and will be soon. From the announcement:

This release both offers a set of new features and, at the same time, turns 5.7.3 into beta state. At first, this sounds a bit contradictory, but we do this for two reasons:

a) the new features introduced are non-intrusive in regard to the existing feature set, so no “bad surprises” are expected
b) other than that, primarily bugfixing went into this release, with
only few remaining issues being open

In order to move towards a new v5-stable, we consider it useful to begin with a new v5-beta stage. Note that this time the beta phase may be rather quick, because of the argument b) above. Note that we did not have any serious bug reports (except for one open issue) since December. Once the remaining issue is solved, we plan to do a short “proof in practice” and then move on to a new stable version.

In regard to new features, this release offers imfile multi-line capability, realtime UDP reception capability and better configurability for ommysql as recently announced for 6.1.3.

With this release, I also plan to conclude moving new features into v5 and try to focus on v6. This may not be practical in all cases, but I will try to stick to this plan as much as possible. During the past few weeks I have had considerable work to do just to integrate the various new features introduced in v4 and v5 into v6. While this was not really hard to do, it requires more than a little effort and very careful handling of the changes. The primary reason is that the code base diverged quite a bit and merging isn’t so much “fun” with that. More than once I even screwed up on some minor details. I hope that with a focus on v6 (for new features), I can spare a lot of time which than can go into new features.

And please do not misunderstand me: I focus on v6 for new features. This means I can focus even more on v5 in regard to correctness (bug-freeness). As usual, I prefer to fix issue in the oldest affected (and supported) release, and traditionally the version before the most current version branch has been very attractive to users because it has a near-complete feature set and a very strong focus on correctness.

log normalization with rsyslog

I just wanted to give you a quick heads-up on my current development efforts:  I have begun to work heavily on a message modfication module for rsyslog which will support liblognorm-style normalization inside rsyslog. In git
there already is a branch “lognorm”, which I will hopefully complete and merge into master soon. It provides some very interesting shortcuts of pulling specific information out of syslog messages. I’ll probably promote it
some more when it is available. IMHO it’s the coolest and potentially most valuable feature I have added in the past three years. Once I have enabled tags in liblognorm/libee, you can even very easily classify log messages
based on their content.

 

calling for log samples!

Now I join those mass of people who are asking for log samples. But I do for a good reason :) Also, I do not need a lot, a single log message works well for my needs. I need them to improve rsyslog so that the parser can even better handle exotic message formats. So the short story is if you have a syslog message, please provide it to me.

And here is the long story:

One of the strength of rsyslog is that it is very much focused on standards. That also means it tries to parse syslog messages according to the relevant RFCs. Unfortunately, syslog has been standardized only recently and so there is no real standard for what to expect inside the header. So rsyslog strength is also its weakness: if messages are ill-formed, results are often suboptimal.

I am working around this by doing smart guesswork inside the legacy syslog parser. However, every now and then some folks pop up with problems. And, more importantly, some others do not even ask. On my twitter account, I recently saw one such frustration. In that case, timestamps were duplicated. I guess that was caused by something unexpected inside the timestamp. However, I was not able to get down to the real problem, because I did not have access to the raw message. That’s an important point: I need the raw message content, not what happens to usually be in the logfile. The later is already parsed, processed and recombined, so it does not tell me what the actual message is. But I need the actual message to improve the parser.

What I would like to do is create a very broad test suite with a vast amount of real-life syslog formats. The message text itself is actually not so important to me at this stage. It is the header format. If I get this, I’d like to analyze the different ways in which the format is malformed and then try to find ways to implement it inside the parser. If I find out that I can not detect the right format in all cases automatically, I may find ways to configure the different formats. The end result, I hope, will be far more plug-and-play message detection, something that should be of great benefit for all users.

Please contribute your logs! I need logs from many different devices, with many different versions. But I need only a few lines from each one. For each individual contributor, there is not a lot of effort required. Even a single log line would be great (ten or so be even greater). Just please don’t mangle the logs and provide me with raw log messages. That’s probably the hardest part. One way to do it is to sniff them off the wire, for example with WireShark. Another way is to use rsyslog itself. All you need is a special template and an output file using it:

$template rawmsg,”%rawmsg%n”
*.* /path/to/raw-file.log

Add this to your rsyslog.conf, restart rsyslog, make the device emit a few lines and mail me the result to rgerhards@gmail.com. You may also simply post the log sample to the sample log thread on the rsyslog forum – whatever you prefer. After you have done that, you can remove the lines from rsyslog.conf again. Before you mail me, it is a good idea to check if there is any sensitive information inside the log file. Feel free to delete any lines you have, but I would appreciate if you do not modify line contents. Also, it would be useful for me if you let me know which device, vendor and version produced the log.

I hope that you can help me improve the rsyslog parser even more. Besides, it will probably be a very interesting experiment to see how different syslog messages really are.

Thanks in advance for all contributions. Please let them flow!

Rainer