a new rsyslog v5 beta – and focus on v6!

I have just generated a new v5-beta. It is being processed for release right now and will be soon. From the announcement:

This release both offers a set of new features and, at the same time, turns 5.7.3 into beta state. At first, this sounds a bit contradictory, but we do this for two reasons:

a) the new features introduced are non-intrusive in regard to the existing feature set, so no “bad surprises” are expected
b) other than that, primarily bugfixing went into this release, with
only few remaining issues being open

In order to move towards a new v5-stable, we consider it useful to begin with a new v5-beta stage. Note that this time the beta phase may be rather quick, because of the argument b) above. Note that we did not have any serious bug reports (except for one open issue) since December. Once the remaining issue is solved, we plan to do a short “proof in practice” and then move on to a new stable version.

In regard to new features, this release offers imfile multi-line capability, realtime UDP reception capability and better configurability for ommysql as recently announced for 6.1.3.

With this release, I also plan to conclude moving new features into v5 and try to focus on v6. This may not be practical in all cases, but I will try to stick to this plan as much as possible. During the past few weeks I have had considerable work to do just to integrate the various new features introduced in v4 and v5 into v6. While this was not really hard to do, it requires more than a little effort and very careful handling of the changes. The primary reason is that the code base diverged quite a bit and merging isn’t so much “fun” with that. More than once I even screwed up on some minor details. I hope that with a focus on v6 (for new features), I can spare a lot of time which than can go into new features.

And please do not misunderstand me: I focus on v6 for new features. This means I can focus even more on v5 in regard to correctness (bug-freeness). As usual, I prefer to fix issue in the oldest affected (and supported) release, and traditionally the version before the most current version branch has been very attractive to users because it has a near-complete feature set and a very strong focus on correctness.

log normalization with rsyslog

I just wanted to give you a quick heads-up on my current development efforts:  I have begun to work heavily on a message modfication module for rsyslog which will support liblognorm-style normalization inside rsyslog. In git
there already is a branch “lognorm”, which I will hopefully complete and merge into master soon. It provides some very interesting shortcuts of pulling specific information out of syslog messages. I’ll probably promote it
some more when it is available. IMHO it’s the coolest and potentially most valuable feature I have added in the past three years. Once I have enabled tags in liblognorm/libee, you can even very easily classify log messages
based on their content.

 

calling for log samples!

Now I join those mass of people who are asking for log samples. But I do for a good reason :) Also, I do not need a lot, a single log message works well for my needs. I need them to improve rsyslog so that the parser can even better handle exotic message formats. So the short story is if you have a syslog message, please provide it to me.

And here is the long story:

One of the strength of rsyslog is that it is very much focused on standards. That also means it tries to parse syslog messages according to the relevant RFCs. Unfortunately, syslog has been standardized only recently and so there is no real standard for what to expect inside the header. So rsyslog strength is also its weakness: if messages are ill-formed, results are often suboptimal.

I am working around this by doing smart guesswork inside the legacy syslog parser. However, every now and then some folks pop up with problems. And, more importantly, some others do not even ask. On my twitter account, I recently saw one such frustration. In that case, timestamps were duplicated. I guess that was caused by something unexpected inside the timestamp. However, I was not able to get down to the real problem, because I did not have access to the raw message. That’s an important point: I need the raw message content, not what happens to usually be in the logfile. The later is already parsed, processed and recombined, so it does not tell me what the actual message is. But I need the actual message to improve the parser.

What I would like to do is create a very broad test suite with a vast amount of real-life syslog formats. The message text itself is actually not so important to me at this stage. It is the header format. If I get this, I’d like to analyze the different ways in which the format is malformed and then try to find ways to implement it inside the parser. If I find out that I can not detect the right format in all cases automatically, I may find ways to configure the different formats. The end result, I hope, will be far more plug-and-play message detection, something that should be of great benefit for all users.

Please contribute your logs! I need logs from many different devices, with many different versions. But I need only a few lines from each one. For each individual contributor, there is not a lot of effort required. Even a single log line would be great (ten or so be even greater). Just please don’t mangle the logs and provide me with raw log messages. That’s probably the hardest part. One way to do it is to sniff them off the wire, for example with WireShark. Another way is to use rsyslog itself. All you need is a special template and an output file using it:

$template rawmsg,”%rawmsg%n”
*.* /path/to/raw-file.log

Add this to your rsyslog.conf, restart rsyslog, make the device emit a few lines and mail me the result to rgerhards@gmail.com. You may also simply post the log sample to the sample log thread on the rsyslog forum – whatever you prefer. After you have done that, you can remove the lines from rsyslog.conf again. Before you mail me, it is a good idea to check if there is any sensitive information inside the log file. Feel free to delete any lines you have, but I would appreciate if you do not modify line contents. Also, it would be useful for me if you let me know which device, vendor and version produced the log.

I hope that you can help me improve the rsyslog parser even more. Besides, it will probably be a very interesting experiment to see how different syslog messages really are.

Thanks in advance for all contributions. Please let them flow!

Rainer

rsyslog now supports Hadoop’s HDFS

I will be releasing rsyslog 5.7.1 today, a member of the v5-devel branch. With this version, omhdfs debuts. This is a specially-crafted output module to support Hadoop’s HDFS file system. The new module was a sponsored project and is useful for folks expecting enormous amounts of data or having high processing time needs for analysis of the logs.

The module has undergone basic testing and is considered (almost) production-ready. However, I myself have limited test equipment and limited needs for and know-how of Hadoop, so it is probably be interesting to see how real-world users will perceive this module. I am looking forward to any experiences, be it good or bad!

One thing that is a bit bad at the moment is the way omhdfs is build: Hadoop is Java-based, and so is HDFS. There is a C library, libhdfs, available to handle the details, but it uses JNI. That seems to result in a lot of dependencies on environment variables. Not knowing better, I request the user to set these before ./configure is called. If someone has a better way, I would really appreciate to hear about that.

Please also note that it was almost impossible to check omhdfs under valgrind: the Java VM created an enormous amount of memory violations under the debugger and made the output unusable. So far I have not bothered to create a suppression file, but may try this in the future.

All in all, I am very happy we now have native output capability for HDFS as well. Adding the module also proved how useful the idea of a rsyslog core paired up with relatively light-weight output/action modules is.

Some results from LinuxKongress

I had a great time both attending the talks on Linux Kongress 2010 in Nuremberg as well as giving my presentation on rsyslog performance enhancements. Even more, the social events helped driving things forward. Among others, I met with Lennart Poettering, who currently works on systemd. He had a couple of good suggestions for rsyslog and I hope to implement at least some of them. On the list for the next v5-devel release are definitely improvements for imuxsock, like the ability to obtain the process id of that process that actually emitted the log message (via SCM_Credentials). Once this is known, we can do a couple of “interesting things” with it.

rsyslog queues, reliability and forwarding

I would like to explain a little bit how rsyslog queues and forwarding interact. First of all, let me state that the queue engine and the action part of rsyslog, where the forwarding happens, are strictly separated. There is no interface between the two that permits an action to affect queue behaviour. For example, I was asked if a forwarding action could initiate disk queue mode when the forwarding fails. The root reason for this was that messages should not be endagered while a remote server fails.

This is not possible with the current design and involves a far-from-trivial design change. However, I do not think that the functionality is actually needed. When talking about reliablity, rsyslog works on the importance of messages and not on the importance of actions.

So in rsyslog we can configure the level of message loss that is acceptable for a given use case. This can be done on an action-by-action basis (or once at the ruleset/main queue level for all actions — usually not a good idea, but from time to time it may be). The extreme ends are a) no message loss at all permited and b) message loss of any magnitude is acceptable. For a), this means we need to configure a disk-only queue with full sync of queue files and management information after each message processed (with message input batches of one). For b), this means we do not need to place any restrictions. Obviously, a) is rather slow while b) is extremely fast. As extremes are not usually what is needed, there are many configuration possibilities between these two extremes. For example, one may define a queue the goes to disk if more than 1,000 messages are kept in main memory, but only then, and that fsyncs queue files every 10 messages (a big performance saver). That means that at any instant, at most 1,010 messages are at risk and can potentially be lost. The queue than monitors these predicates and switches to disk mode only when required. This is a very big performance saver.

Now let’s switch a bit the perception of things: Let’s go with our starting example and say you want to go to disk only when the remote system is down. But what if the remote system is up, but can not accept messages quickly enough. Let’s assume a backlog of 10,000 messages builds up. Is it then acceptable to keep these in main memory, just because the remote system is accepting data? If this risk is acceptable, why would it be inacceptable if the remote system is not yet accessible. If we say one case is acceptable but the other not, we somewhat contradict ourselves: it is almost random if the remote system is accepting messages, so why does it make a difference in the risk we permit?

This contradiction is the core reason why rsyslog does not look at external events or action failure causes but rather works on the basis of “acceptable risk”. Let’s say it is acceptable to lose 1,000 messages. Then, it is irrelevant if we lose these while the remote system is accepting or not. Consequently, rsyslog enforces disk mode if the remote system is down and there are more than 1,000 messages inside the queue. But it does not enforce this if there are only 500 messages waiting to be sent? Why should it? After all, the user has specified that a loss of 1,000 messages is acceptable, and so we do not try to guard these messages more than required by this policy. Note, of course, that if rsyslog is terminated in such a situation, of course a disk queue with 500 messages is created. We do not voluntarily lose messages, and if we terminate, we can no longer hold them in main memory. Consequently, they must be written out (of course, again depending on configuration). So the in-memory queue is retained across rsyslog restarts. But it is important to point out that unexected aborts – like sudden loss of power – can cause message loss in such scenarios. This is no different from sudden loss of power with an accepting remote system and a queue of 500. If such a risk is unaccetable, we have what I initially described in scenario a).

As a side note, rsyslog queues provide very high reliability. Every message is removed from the queue only after the action acknowledges that it has been processed. This kind of reliablity is used in some very demanding audit-grade environments (which I, as usally, not permitted to name…).

To sum up, rsyslog protects message integrity not be external events but by user-configurable “acceptable risk” policies.

We consider this a superior approach, as external events are somewhat unreliable when it comes to protecting traffic bursts. Relying on external events has a number of anomalies, as hopefully explained above.

getting serious with rsyslog v6

I have just created a new v5-devel branch based on the current master AND have then merged the newconf branch into master. This “officially” means the beginning of rsyslog v6. So I thought that justifies a short blog post.

As already elaborated, v6 will focus on a better configuration language. The current version already has scoping for actions, but no doc yet for it. I will try to add the doc, so that I can hopefully officially release the devel version this week. I’d also like to work a bit more on imptcp, so that I have some common base functionalty in all versions that support it.

Starting new v5-beta branch

I will start a new v5-beta branch soon. I have seen that a number of things have been changed since the last stable release. Most importantly, some of these changes fix bugs. Bug, that are hard to fix in the current stable version. This is because v5 got not so much exposed to reality, but adoption rate seems to increase and so we have been able to iron out some issues while doing refactoring on the latest development release.

I have thought about backporting the bug fixes. However, this doesn’t seem to make too much sense: it is a lot of work, and, when done the current v5-stable will have code very close to the development branch. So I decided to accept the bugs for the time being and instead see the we can get a new 4.6.0 stable release as soon as possible. The first step on that route is to create a new beta. Based on past experience, I think we can promote that the stable in September (as we already got some good feedback on it).

Anybody with problems in the current v5-stable should simply update to the beta as soon as it is available.

coding new config format begun

After a long discussion about potential new config formats for rsyslog, we came to the conclusion that the current format is not as bad as I thought and just needs scoping (OK, the whole story is longer, but I can’t repeat that lengthy discussion in a single blog post, see mailing list archive for details).

After some thinking, I finally started coding today.

I have begun to implement action scoping. A snapshot with partial functionality is available at

http://git.adiscon.com/?p=rsyslog.git;a=shortlog;h=refs/heads/newconf

It does NOT yet include the necessary output plugin interface change (or updated plugins), but it implements

$Begin action
$End action
$StrictScoping [on/off] — off default

So if you want to play a bit with it, feel free to do so. Note that it disallows global statements within action scope and in overall has somewhat better detection of user errors (these are easier to detect when scoping is used).

Note that scoping is purely optional: if not enabled, it will not be used. So any current rsyslog.conf remains valid.

I will see that I change the projects’s output plugins next week, and will possibly then make an experimental release.