rsyslog internal messages

I had an interesting conversation with someone who runs multiple instances of rsyslog on a machine for remote reception, only and for some reasons, another syslogd for local messages. The question arose where rsyslog error messages are emitted to.

It was expected that the showed up in the other syslogd. However, that is not the case, and for good reason. So I thought it is good to provide some general advise on how internal messages are emitted.

First of all, internal messages are messages generated by the rsyslog itself. The vast majority of them is error messages (like config error, resource error, unauthorized connect etc…), but there are also some status-like messages (like rsyslogd startup and shutdown, unexpectedly dropping tcp connection, …). Traditionally, rsyslog does not make a distinction between status and error messages (we could change that over time, but so far nobody asked what means this is not worth the hassle).

Rsyslogd is a syslogd, so all message it emits internally are syslog messages. For obvious reasons, they use the “syslog” facility. And as all are flagged as error message, to total priority is “syslog.err”. The internal message source is implicitly bound to the default ruleset.

It now depends on how that ruleset is defined where these messages show up. I strongly encourage everyone to include a rule that logs these message. If there are some e.g. config issues, they can be easily solved by looking at the emitted error message. But if you do not have them, it can take you ages to sort out what is wrong.

So you should always make sure that “syslog.err” (or probably better “syslog.*”) is logged somewhere.

If you now would like to use another syslogd to log these messages, but not rsyslog itself, you do what you usually do in this situation: first of all, make sure that no local rule logs syslog.* messages. Then, include a rule that forward syslog.* to the recipient that you want to receive it. You have the full flexibility of the rule engine at hand to limit or reformat those messages. Note that an elegant solution to do both is including the following 2 lines at the top of rsyslog.conf (I assume you use UDP-forwarding to another syslogd running on the same host machine):



syslog.* @127.0.0.1
& ~

Note that the tilde character is the discard action.

disk assisted mode performance in rsyslog

I would just like to clearify one thing: rsyslog supports disk assistance in which case messages are written to disk if the in-memory queue becomes full.

However, it is generally bad if the system needs to go to the disk during normal operations. That is primarily meant for things like output targets going offline. If this happens during normal operations, one is probably lost. In the v3&v4 engines, when disk mode is enabled, the in-memory worker threads are shut down. So all processing then takes place over the disk. That means processing will be slower than before. So if the system was incapable of handling the work load when running on a pure in-memory queue, it will definitely be incapable of handling it in disk mode.

Note that things are different in recent v5 engines: starting with 5.3.5, the disk worker runs concurrently to the in-memory workers and as such the performance is similar to what it was in non-disk mode. Still, overall processing is probably slower, so going to disk is not a cure for a system that can not handle the overall workload. In v5, however, it may be a way to handle excess bursts.

Priorities for rsyslog Work

I receive requests for support and code additions to rsyslog every day and I am grateful so many people express their interest and see rsyslog as a useful tool.

The bottom line, unfortunately, is that I can not do everything and I also can not do many things as quickly as I would like to. Also, I have to admit, there are some things that I do not like to do, at least as a cost-free activity. The typical example is work that benefits only a single or small subset of commercial organizations.

I suggest that you read a bit about my philosophy on how open source projects are paid philosophy. Note that “payment” includes for more things other than money, for example good suggestions and bug reports.

I tend to follow this priority scheme, with some variations:

  1. security-related issues
  2. serious problems in the current release
  3. serious problems in previous releases
  4. paid work
  5. things useful to the community at large
  6. things useful to smaller parts of the community (with descending priority)
  7. support for non-commercial folks
  8. bugs in older releases already fixed in newer ones
  9. activities aiding only commercial organizations

The term “things useful” is deliberately vague. Among others, it involves fixing bugs, adding new features and following support requests. However, support requests usually fall only in that category if either a bug is involved or I can gain some more insight into things that need to be changed (like better doc, general user needs, etc…).

Note that, as of my philosophy, I try to avoid doing work for free that only benefits a commercial party, but neither me personally nor the project. If you find this harsh, read my in-depth explanation of that philosophy.

Paying for Open Source Projects…

A selected the word “paying” in this post’s title deliberately. Of course, open source software usually is (and should be) cost-free to all interested parties, but that does not mean there comes no price tag whatsoever with it.

As an open source author I need to admit that it is virtually impossible to give away everything without any price. “Price”, in my perception, does not necessarily mean “money”. There are many benefits you may gain from working on software, and money is only one of them.

But first of all, let me re-iterate the FSF’s “freedom vs. free beer” argument, in which I fully believe:

“Free software” is a matter of liberty, not price. To understand the concept, you should think of “free” as in “free speech,” not as in “free beer.”

This is very true. In my personal mind, I would really love to give away any work I create to those that need it. But that thought involves some subtle issues. One prominent problem is that other people may think differently. For example, my landlord doesn’t like this idea. Nor does my bakery. Not even the computer manufacturer, on whom’s system I develop my software! What a shame! So if I gave away everything for free, I would, thanks to the social security system, probably not die, but I would definitely not have a machine to create those things I would like to provide for free.

So it looks like I need to make a compromise, give away certain things and charge for others. One approach would be to quit computing as a profession and become a gardener instead. In my spare time I could then program and give away everything for free. The bottom line is that I could program much less than I can do currently. Also, I prefer programming over gardening. So this does not look like a good approach – neither for me personally (the then-unhappy gardener) nor for society at large (who can no longer gain the full benefit of my work: believe me, I am far more productive as a programmer as opposed to a gardener…).

So this seems to be the wrong approach. It naturally follows that I need to charge for some of the computing work I do.

Then, look at my motivation as an open source developer. I’d like to make the world a little bit a better place, providing useful tools. And, if I am honest, I may even like to get a little bit of fame as a recognized open source developer. I guess that motivates many, but few admit to it ;) This hits a sweet spot of “payment”: being recognized feels good and thus it keeps me motivated. Seeing the project grow and spread also motivates me. Projects where there is no feedback and which do not grow are usually quickly abandoned. Why? Because not even the most basic “payment” is provided in exchange for the work done.

So a very important form of “payment” to open source authors, at least in my point of view, are contributions to the project, help in spreading news about it, and, (very, very valuable) good bug reports. Everything that helps push a project and make it evolve. Of course contributions in any form are also happily accepted (be it software, hardware, book, …., and of course money). Money is not evil. It pays the electricity to run my machine, among others.

Taken the arguments together, there is no ideal world where I can give away everything and receive in exchange whatever I need (and, I barely remember, experiments in trying this failed miserably…).

With that on my mind, I begin to divide the world in “friends” and “foes”. Friends are those that provide me with some form of “payment”, that is anything that is useful for me. Good examples are the folks that write the open source software *I* use (aha, this is cyclic!), folks that provide good bug reports and try out development versions etc. Any activity that I can also use to my benefit makes you my friend.

Then, there are “foes”. That world probably is too hard and maybe should be re-phrased as “non-friends”. But the term and the idea is well known.

If you are not my friend, you do not contribute anything that I can use for my benefit. This doesn’t mean you are a bad guy. If you and I do not have anything in common, why should you do something that benefits me? There are far more people that I never provided any benefit to than there are people where I did. I guess that is true for almost all of us except a few outstanding people (which then usually receive admiration as a sort of “payment”).

But if you are not my friend, you should not expect from me that I do anything for free for you. Envision a stranger comes to your home and asks you if you would like to help him build his home. I guess you will be astonished and probably ask “Why should I do that?”. Now assume the sole answer is “Because that is good for me, the stranger, but you need to bring your own clothes and tools and need to pay the gas to come to my home”. Would you be willing to help that guy out? I guess, the answer would be “no” in almost all cases.

So why should I as an open source developer create software for or otherwise help a non-friend? Why am I supposed to say “yes, of course” if a stranger asks me “Can you implement this and that, but you need to pay for your own hardware and the electricity used and also for…”? The answer is: I am not! So don’t silently expect me to do that.

Of course, the question itself may have made you my friend. How come? Simple: the idea you propose may be a very useful idea for my project. If it gets implemented, it will help many of my currently existing friends and it will eventually help spread the project. So by providing the idea itself, you did me a big favor, which one may consider as a form of “payment”. Consequently, I often implement things asked for by complete strangers. And I often try to help out complete strangers on the mailing list and on other support channels. Here, I often learn a real lot about what is good and bad about my projects. This is a very valuable for of “payment” for me.

HOWEVER, and this is my personal limit, whenever I am asked to do something for free, I evaluate *my* benefit in doing so. Of course, this includes the benefit to the project and the benefit to the community at large, but this all goes into the picture of “my” benefit as the sum of all that.

So if a complete stranger asks me to do something, I check for immediate benefits in doing that. Sometimes, there are cases where I can see a benefit, but only to that stranger. Usually, these are things corporate guys need, and they are very special and non-generic. If there is no benefit at all, I simply do not look any further. Of course, the proper solution here is that those folks can actually pay money to make me implement whatever they need. The logic behind this is that when they pay money, the help fund activities that also benefit the project at large. But if they are corporate guys, and they do not get any money approved for what they (think they) need, they don’t really need it at all! Because if it were really useful for there corporation, they would have received the money grant (corporations are very good in making these trade-offs, though they occasionally fail ;)). So in short, the money is even a filter that prevents me from spending time on things that nobody really needs!

If a friend comes along and asks me to do something, I still need to evaluate the request. But I am much more likely to implement the functionality requested (its a game of “give and take”). Of course, I need to evaluate the overall priority for my project here, too. But friends definitely receive a priority boost if at all possible. And I think this is only fair.

In general, I favor requests that are useful to the community at large over those that are only useful to a small subset of it. I tend not to implement without any form or “hard” payment (hardware, money, a nice vacation on Hawaii… ;)) anything that is only useful to a single commercial organization. For example, why should I provide free services to a company that expects me to pay, e.g. the utility bill? If you do not give your services to me for free, don’t expect me to give my time for free to just your benefit (think about the “stranger asking for my help on building his home” analogy).

My thoughts my sound very material, but in fact they just describe on what I think is fair in the non-perfect world we live in. Remember that most non-profit organizations are my friend, because they offer useful service to “me” (as part of the community). And think about my thoughts in the intro of this blog post about my inability to do any useful work at all if I would NOT have a somewhat material point of view. So, honestly, I think my philosophy here is not actual “material” but rather a result of how life is…

Edit: it may also useful to have a look at my blog post “work, friends and personality“, which looks at a very similar issue from a slightly different angle.

The philosophy also influences priority decisions in my open source projects, as outlined for example in “rsyslog work priorites“.

ACLs, imudp and accepting messages

I am working again on moving the DNS name resolution outside of the input thread of those sources where this is potentially time-consuming and affecting message acceptance rates. As it turned out, currently imudp seems to be the only case.

While this is potentially easy to do, a problem is ACLs ($AllowedSender) which use system names rather than ip addresses. In order to check these ACLs, we need to do a DNS lookup. Especially in the case of UDP, such a lookup may actually case message loss and thus may be abused by an attacker to cause a certain degree of denial of service (what also points out that these types of ACLs are not really a good idea, even though requested by practice).

In the light of this, I will now do something that sounds strange at first: I will always accept messages that require DNS lookups and enqueue these into the main queue and do the name resolution AND the final name-based ACL check only on the queue consumer part. Please note that it will be done BEFORE message content is parsed, so there is no chance that buffer overlow attacks can be carried out from non-authenticated hosts. The core idea is to move the lengthy, potentially message-loss causing code, away from the input thread. The only questionable effect I can currently see is that queue space is potentially taken up by messages which will immediately be discarded and should not be there in the first place. At the extreme end, that could lead to loss of valid messages. But on the other hand valid messages are more likely to be lost by the DNS name query overhead if I do the ACL check directly in the input thread.

If anyone has an argument against this approach please let me know.

A solution for invalid syslog message formats…

In syslog, we traditionally have a myriad of message formats, causing lots of trouble in real-world deployments. There are a number of industry efforts underway trying to find a common format. To me, it currently does not look like one of them has received the necessary momentum to become “the” dominating standard, so it looks like we need to live with various presentations of the same information for some more time.

The past two weeks, I have begun to make additions to rsyslog that hopefully will help solve this unfortunate situation. I know that I have no real cure to offer, but at least baby steps toward it. I have introduced so called message parsers, which can be utilized to convert malformed messages into rsyslog’s well-formed internal structure.

Why is it not a solution? Because what I really introduced was actually an interface, which permits to write different parsers for the myriad of devices. I have not provided a generic solution to do that, so the individual parsers need to be written. And secondly, I have not yet defined any more standard properties than those specified in the recent IETF syslog rfc series, most importantly RFC5424.

So why I hope this will lead to a long-term solution?
First of all, there are some hopes that the IETF effort will bring more standard items. Also, we could embed other specifications within the RFC5424 framework, so this could become the lingua franca of syslog message content over time. And secondly, I hope that rsyslog’s popularity will help in getting parsers at least for core RFC5424 information objects, which would be the basis for everything else. Now we have the capability to add custom parsers, and we have an interface that third parties can develop to (and do so with relative ease).

All in all, I think this development is a huge step into the right direction. The rest will be seen by history ;) To me, the probably most interesting question is if we will actually attract third party developers. If there are any, I’ll definitely will help get them going with the rsyslog API.

next round of performance enhancement in rsyslog

Today, I made a very important change to rsyslog: rulesets now can have their own “main” queue. This doesn’t sound too exciting, but can offer dramatic performance improvements.

When rsyslog was initially created, it followed the idea that messages must be processed in the order they were received. To facilitate that, all inputs submitted message to a single main message queue, off from which the processing took place. So messages stayed in reception order. … Well, actually they stayed in “enqueued order”, because it depended on the OS scheduler if input modules could really enqueue in the order they received. If, for example, input A received two messages, but was preempted by module B’s message reception, B’s data could hit the queue earlier than A’s. As rsyslog supported more and more concurrency, the order of messages did become ever less important. The real cure for ordered delivery is to look at high-precision timestamps and build the sort order based on them (in the external log analyzer/viewer).

So, in essence, reception order never has worked well and the requirement to try keep it has long been dropped. That also removed one important reason for the single main message queue. Still, it is convenient to have a single queue, as its parameters can be set once and for all.

But a single queue limits concurrency. In the parallel processing world, we try to partition the input data as much as possible so that the processing elements can independently work on the data partitions. All data received by a single input is a natural data partition. But the single main queue merged all these partitions again, and caused performance bottlenecks via lock contention. “Lock contention”, in simple words, means that threads needed to wait for exclusive access to the queue.

This has now been solved. Today, I created the ability to create ruleset-specific queues. In rsyslog, the user can decide which ruleset is bound to which inputs. For a highly parallel setup, each input should have its own ruleset and each ruleset should have defined its own “main” queue. In that setting, inputs do no longer block each other during queue access. On a busy system with many inputs, the results can be dramatic. And as more as a side-effect, each ruleset is now processed by its dedicated rule processing thread, totally independent from each other.

This design offers a lot of flexibility. But that is not enough. The next step I plan to do is to create the ability to submit a message to a different ruleset during processing. That way, hierarchies of rulesets can be created, and these rulesets can even be executed via separate thread pools, with different queue parameters and in full concurrency. And the best is that I currently think it will not be very hard to create the missing glue.

The only really bad thing is that the current configuration language is really not well-suited to handle that complexity (“really not” is not a type for “not really”…). But I have no alternative than to take this route again, until I finally find time to create a new config language. The only good thing is that I get better and better understanding of what this new language must be able to do, and it looks that my initial thoughts were not up to what now is required…

Canonical Paper on RSyslog

I just found out that Canonical (the Company behind Ubuntu) did a nice paper on rsyslog, which also explains why Ubuntu chooses rsyslog as its default syslogd.

It is interesting to see that the paper is well-written and well-researched, but rsyslog has also evolved while the paper has been written. So in fact, it offers even more features than described in the paper.

And, obviously, I am glad to see Ubuntu move to rsyslog as well.

Will Microsoft remove the Windows Software RAID?

These days, hardware rates are quite inexpensive. So everybody is moving towards them. However, all mainstream operating systems still support software RAIDs, maybe even for a good reason: an os-controlled software raid may be a bit better to optimize under some circumstances. Anyhow. Microsoft seems to move away from that feature set:

As you probably know, Adiscon provides premier Windows event log processing solutions. Some of our customers use the products for example to monitor if their RAIDs break. And some of them use software RAIDs. So we wrote a nice article on how to monitor RAID health using the Windows Event Log.

Since the days of Windows NT 3.1 (or was it 3.5), the Windows logged an error message if the RAID failed. Actually, I’d consider this a necessary functionality for any working RAID solution. Why? Well, if the RAID solution works, you will not notice that a disk has died. So if nobody tells you, you’ll continue to use the system as usual, not suspecting anything bad. So guess what – at some time the next disk fails and then (assuming the usual setup) you’ll be “notified” by the disk system, with those nice unnercoverable i/o errors. So without any health alerts, a RAID system is virtually useless.

We learned, that Windows Server 2008’s RAID system does no longer issue these alerts! (aka “is useless” ;)). So a long while ago, we reported this to Microsoft. The bug went through several stages of escalation. A few minutes ago, my co-worker got a call from the frontline Microsoft tech. He told him that, regrettably, Microsoft won’t fix this issue. According to his words, Micorosoft has confirmed this to be a bug, and the group responsible for ftdisk has confirmed that it should be fixed but someone more powerful up in the hierarchy has opted not to do that. Boom. The tech tried to persuade us to switch to a hardware RAID, but actually that was not the point of the support call ;)

What does that mean? To me, it looks like Microsoft is actually moving away from providing software RAID. How other can one explain that there is no interest in providing any error message at all if something goes wrong with the RAID. Given the wide availability of hardware RAIDs (which, btw, provide proper alerting), this step does not look illogical. But do they really want to leave Linux with being the only widely deployed mainstream operating system that provides software RAID? Or do they intend to keep it on the feature sheet, but provide a dysfunctional solution like in Windows Server 2008?

Let’s stay tuned and listen what the future brings…

Another note on hard-to-find-bugs…

Before I began to write this blog post, I realized how long I had not written anything! I promise to begin to write in a more timely manner, but the past weeks were merely a consolidation phase, ironing out bugs from the new releases.

I’d like to elaborate on one of these, one that really drove me crazy the past days. The problem was that omfile’s directory creation mode was sometimes set to zero (well, almost always in some configurations). What began as a minor nit turned into a real nightmare ;)

The issue was that the variable fDirCreateMode was always set to zero, except if it was written to at the start of module initialization or when it was simply displayed at start of module initialization. That sounded strange, but even stranger seemed that by moving around the variable definition in the sources code (and thus assumingly changing its memory location), nothing changed. So I came to a point where I used this code as a patch:

omfile.c from rsyslog git

Look at line 769. With that seemingly unrelated write, the variable stayed as expected. However, if I moved the write to a function, nothing worked again. Strange… After committing the patch, testing showed that the directory permissions now worked well BUT the file create mode now behaved wrong in the same way.

I was stunned – and speechless. What followed, were intense debugging sessions. I ended up finding the commit that introduced the problem, but still could not see why that commit would affect anything. After hours of debugging, I ended up with a stripped-down and almost codeless omfile, which still had the same problem. And it appeared and disappeared almost at random as code lines were moved in and out.

I once again checked the git history and then I noticed that a few commits down the line, I had introduced another config variable for the io buffer size. Now I finally had the idea. The size-type config directives were introduced for file size restrictions. Thus, the regular 32 bit integer is not sufficiently large for them. Consequently, they needed 64 bit integers as pointers! But, of course, I had provided only a pointer to a 32 bit int, thus the config handler overwrote another 32 bits that happened to be close to the address I provided.

This was clearly an error. But could it explain the behavior I saw? Not really… But the problem went away once I had corrected the issue. So I began to suspect the that compiler hard re-ordered variable memory assignment in order to optimize access to them (maybe to get a better cache hit rate or whatever else). But hadn’t I compiled with -O0 and as such no optimization should take place? I checked, and I realized that due to a glitch in lab setup, optimization actually was on, and not turned off! So now I think I can explain the behavior and theory as well as practice go hand in hand.

Really? What about the write and the debug print that made everything work? I guess these changes triggered some compiler optimization and thus the memory assignment was changed and so the “extra 32” bit pointed to some other variable. What also explains why the file creation mode was affected by my change. As well as why the bug reacted quite random to my testing code changes.

So it looks like I finally resolved this issue.

Lessens learned? Re-check your lab environment, even if it always worked well before. Be careful with assumption about memory layout, as the optimizer seems to heavily reorder variables, and even single statements and statement sequences seem to make a big difference. I knew the compiler reorders things, but I did not have this clear enough on my mind to become skeptic about my lab setup.

And, as always, some assumption limit your ability to really diagnose what goes on… May this be a reminder not only for me (I wonder how long it will last) but for others as well (thus I thought a blog post makes sense ;)).