rsyslog now default on stable Debian

Hi all,

good news today. Actually, the good news already happened last Saturday. The Debian project announced the new stable Debian 5.0 release.

Finally having a new stable Debian is very good news in itself – congrats, Debian team. You work is much appreciated!

But this time, this was even better news for me. Have a look at the detail release notes and you know why: Debian now comes with a new syslogd, finally replacing sysklogd. And, guess what – rsyslog is the deamon of choice! So it is time to celebrate for the rsyslog community, too.

There were a couple of good reasons for Debian to switch to rsyslog. Among others, an “active upstream” was part of the sucess, thanks for that, folks (though I tend to think that after the more or less unmaintained sysklogd package it took not much to be considered “active and responsive” ;)).

Special thanks go to Michael Biebl, who worked really hard to make rsyslog available on Debian. It is one thing to write a great syslogd, it is a totally different one to integrate it into an distro’s infrastructure. Michael has done a tremendous job, and I think this is his success at least as much as it mine. He is very eager to do all the details right and has provided excellent advise to me very often. Michael, thanks for all of this and I hope you’ll share a virtual bottle of Champagne with me ;)

Also, the rsyslog community needs sincere thanks. Without folks that spread word and help others get rsyslog going this project wouldn’t see the success it experiences today.

I am very happy to have rsyslog now running by default on Fedora and Debian, as well as a myriad of derivates. Thanks to everyone who helped made this happen. So on to a nice, little celebration!

Thanks again,
Rainer

PS: promise: we’ll keep rsyslog in excellent shape and continue in our quest for a world-class syslog and event processing subsystem!

screwed up on LinkedIn ;)

A couple of days ago, I created a rsyslog group on LinkedIn. Then I was curios what happened. Well, nothing. Nothing at all. So I thought it was probably not the right time for such a thing.

And, surprise, surprise, I today browsed through LinkedIn and saw there were 16 join requests. Oops… there seem to be no email notifications for them. Bad… Well, I approved all folks. If you were one of them and now read this blog post: please accept my apologies! Obviously, this was just another time I screwed up on the Internet…

To prevent any further such incidents, I have now set the group to automatically approve everyone who is interested in joining. That’s great for this type of group, actually I am happy for everyone who comes along ;)

When does rsyslog close output files?

I had an interesting question on the rsyslog mailing list that boils down to when rsyslog closes output files. So I thought I talk a bit about it in my blog, too.

What we need to look at is when a file is closed.
It is closed when there is need to. So, when is there need? There are currently three cases where need arises

a) HUP or restart
b) output channel max size logic
c) change in filename (for dynafiles, only)

I think a) needs no further explanation. Case b) should also be self-explanatory: if an output channel is set to a maximum size, and that size is reached, the file is closed and a new one re-opened. So for the time being let’s focus on case c):

I simplified a bit. Actually, the file is not closed immediately when the file name changes. The file is kept open, in a kind of cache. So when the very same file name is used again, the file descriptor is taken from the cache and there is no need to call open and close APIs (very time consuming). The usual case is that something like HOSTNAME or TAG is used in dynamic filename generation. In these cases, it is quite common that a small set of different filenames is written to. So with the cache logic, we can ensure that we have good performance no matter in what order messages come in (generally, they appear random and thus there is a large probability that the next message will go to a different file on a sufficiently busy system). A file is actually closed only if the cache runs out of space (or cases a) or b) above happen).

Let’s look at how this works. We have the following message sequence:


Host Msg
A M1
A M2
B Ma
A M3
B Mb

and we have a filename template, for simplicity, that consists of only %HOSTNAME%. What now happens is that with the first message the file “A” is opened. Obviously, messages M1 and M2 are written to file “A”. Now, Ma comes in from host B. If the name is newly evaluated, Ma is written to file B. Then, M3 again to file A and Mb to file B.

As you can see, the messages are put into the right files, and these files are only opened once. So far, they have not been closed (and will not until either a) happens), because we have just two file descriptors and those can easily be kept in cache (the current default for the cache size, I think, 100).

I hope this is useful information.

On the reliable plain tcp syslog issue … again

Today, I thought hard about the reliable plain TCP syslog issue. Remeber? I have ranted numerous times on why “plain tcp syslog is not reliable” (this link points to the initial entry), and I have shown that by design it is not possible to build a 100% reliable logging system without application level acks.

However, it hit me during my morning shower (when else?) that we can at least reduce the issue we have with the plain TCP syslog protocol. At the core of the issue is the local TCP stack’s send buffer. It enhances performance but also causes our app to not know exactly what has been transmitted and what not. The larger the send buffer, the larger our “window of uncertainty” (WoU) about which messages made it to the remote end. So if we are prepared to sacrifice some performance, we can shrink this WoU. And we can simply do that by shrinking the send buffer. It’s so simple that I wonder a shower was required…

In any case, I’ll follow that route in rsyslog in the next days. But please don’t get me wrong: plain TCP syslog will not be reliable if the idea works. It will just be less unreliable – but much less ;)

Low-End Windows Event Log Tool Released

Adiscon, my company, has just released EventConsolidator, an easy-to-use tool for Windows event log consolidation targeted to the small business market. Unlike our full-blown EventReporter and MonitorWare Agent products, this is a purely agentless solution to monitor the standard Windows Event Log files. Also, it does not (yet) store any events in a central repository but rather works on the native event logs of the Windows machines it monitors.

The tool comes with basic display and searching abilities and some preconfigured reports. This is Adiscon’s first move into that market segment. I am very interested to see which feedback we get from that tool. We are very open to all customer suggestions. I have to admit that I argued it may be better to do the essentials first and then look for what people really need rather than to build a complex one-size-fits all approach. So it will be interesting, at least for me, to see if that thought works out. Just for the records, EventConsolidator is commercial software, but a full featured free trial is available form the EventConsolidator site.

US Citizen? Your credit is in doubt…

I was introduced to a very subtle effect of the Heartland breach. Remember, card processor Heartland has screwed up and, as some sources say, 100 million credit card numbers were stolen from them via a Trojan. That fact spread big news and, among others, started a discussion if PCI has been proven to be useless. But there seem to be additional effects: US customers seem to have lost a lot of credibility in international shopping.

In Adiscon’s office, I heard today that we got a call from one of our card processors. Keep in mind that we are based in Germany. The card processor inquired about a recent transaction and asked us to check whether this could be credit card fraud. It was not, but he left us his phone number so that we could check with him in the future when we suspected fraud on transactions.

This is quite unusual and immediately drew my attention. I gave that guy a call. He explained that they are routinely checking US credit card transactions because some problems have been seen recently with US cards. He explained to me that the processor would like to protect merchants, because “if you ship the goods and the cardholder protests the charge … weeks later … you will be charged back but unable to recover the goods” (good point, btw). So I came up and asked if they were calling because of the Heartland breach. Not only, he said, but that would be an example (I deciphered this as a “yes”). So then I asked if they had not blacklisted the affected card numbers. Some statements followed, which I deciphered to mean “no”. So the cards are still active and seem to cause issues (why else would a card processor begin to call its merchants?).

I know that heartland does not know exactly which card numbers have been stolen. But it is known that most probably any card processed within the past 10 month is highly suspect. So wouldn’t it have been fair security practice to put these cards on the blacklist and issue new ones to the cardholders? Sure, that would be inconvenient (read: costly) and, probably more important, would have shown to everyone that someone has screwed up, but would that not be much better than putting both consumer and vendors at risk? Without an automatic blacklisting, consumers need to pay much more attention to their credit card bill.

An interesting side-effect is that US customers seem to have lost credit outside of the US. For example, it was suggested to me that we check each US-order in depth before delivering anything. If everyone else gets this advise, US customer’s will probably find shopping overseas quite inconvenient…

If you loose your credit card, you are legally required to call your card issuer and report that loss. As long as you do not notify them, you are liable. If, on the other hand, someone in the card industry looses your card (number), nobody seems to be liable: Customers must check their statements and vendors must do in-depth checks (sigh) on their customers. Is this really good practice?

And what if a card is used to commit credit card fraud? No problem at all (for the card industry): either the cardholder will not notice it (and pay the fraud) or the cardholder protests the charge, in which case the merchant needs to pay. The later case involves some manual processing by the card industry: again, no problem! The merchant is charged a hefty protest fee. Looking at how hefty the fee is, it seems to be even profitable for the card industry if it takes that route.

Bottom line: who is responsible? Card industry (Heartland in this case). Who pays? Everyone else! Isn’t that a nice business model? Where is the motivation to keep such a system really secure?

I think that really questions if the card industry is interested in security. PCI may not have failed (I tend to agree to Anton Chuvakin here). But it smells a bit like PCI and whetever else efforts can not succeed, because they are not deployed in a honestly security-aware environment but rather in one that needs good execuses for sloppy security. As long as the card industry does not do the right thing as soon as it costs the card industrie’s money, real security can not be achieved.

Begun to roll out race patches…

I have now begun to roll out the rsyslog race patches. Before the weekend, I rolled out the patch for the debian_lenny and development branches and today the beta branch followed. I am now looking forward for feedback from the field. The patch for v3-stable is ready (and available via git), but I’d like to get at least a bit more feedback before I do another stable release.

Wanna play? No, says the DRM!

Do you like DRM? Isn’t that a perfect thing to make sure you are properly licensed with all your music, movies and, of course, software? Well, folks like the EFF have strongly opposed DRM right from the beginning. One of their arguments always has been that, if thought to the end, would revoke the user the ability to work do with his machine what he wants do.

Now we see a perfect sample. Grave Rose just posted a nice link on twitter: “Gears of War DRM screwup makes PC version unplayable“. It’s all about a DRM cert that seems to have expired with the end result that the game no longer works. Thankfully, as we do not (yet) have the full trusted computing platform in place. So, you still change you PC. This enabled users to set back their system clocks and so the game worked again. rofl…

Granted, this is not a real DRM issue. Such an expiration date can be encoded in software ever since. With a good debugger, it is not too hard to remove it (of course, that’s not legal and with DRM it is considerably more work to do…). But if we are forced to use more and more DRM and if we are forced to use hardware platforms that deny true admin access to its owner and we have legislation that outlaws helping yourself – won’t those issues become the norm.

For most of the time, you could rest assured that once you had installed something, and did not change it, it was likely to run for eternity (well… somewhat). This seems no longer to hold true. The only true solution is to use as much open source as possible and say no to any DRM-enabled products.

As an interesting side-note, I am not sure if the poor gamers who set back their system clocks are in legal troubles: didn’t they try to circumvent a technical copy protection? Not sure about the DMCA, but in Germany you could argue that this is an illegal attack… Happy gaming!

rsyslog: optimizing exception handling

The recent analysis of rsyslog’s race condition has fueled some related and some not-so-related discussions. Among them is an old-time favorite, that is performance enhancement. I have finally taken the time to write about rsyslog’s “exception handling” and what I do not like about it.

I am reproducing a forum post here, in the hopes that it will be easier to find – and attract more attention – if it is available via the blog. Comments I would appreciate via the forum, so that I can keep track of them in a single location. With that said, here we go:

In rsyslog, a kind of exception handling is done by the “iRet” mechanism. In short, there exists an integer data type that conveys a universal return code. This code ranges from “all OK” over “all OK, but this and that information”, “we had a warning” to “something went wrong”. States are encoded as integer numbers. By calling convention, almost all functions return such an iRet value (named after its variable name). More importantly, every caller checks the outcome and employs a kind of exception handling when something unexpected happened (like doing resource cleanup). As an aid to the developer, most of the inner workings are encapsulated in easy to use macros.

For example, the return code checking is done via the CHKiRet(f(x)) macro, which expands to something like

if((iRet = f(x)) != RS_RET_OK)
goto abort_finalize;

As such, the innocent-looking (and frequently found )sequence

CHKiRet(f(x));
CHKiRet(g(x));
CHKiRet(h(x));

results in lots of conditional branches. Such code places a big burden on a CPU’s speculative execution resources. For example, it may need a lot of space in the branch pattern table, ejecting other, potentially useful entries from the cache. Given the fact that the quality of speculative execution affects execution speed considerably on modern CPUs, pressing the speculative system to its max is probably not a wise idea.

One performance enhancement approach is to find ways that enable the code to be executed in larger linear blocks. The most important observation is that in almost all cases, the if() condition is never true, because typically the outcome of the function called is an OK state.

I thought about using longjmp to provide the necessary functionality, but the setup effort for longjmp, on *quick* lock, seems to be too high, especially in the case of the number of small functions that are present in rsyslog (and inlinening does not help with this issue). The answer is probably too look at how the C++ exception mechanism is implemented and build a solution similar to that (just like many of the object callbacks are inspired by the C++ method call tables).

I have not yet begun to dig seriously into this optimization, as there are plenty of other things that can be improved and that promise to have much more effect (like the reduction of the overall number of system calls needed on a per message basis).

However, I would appreciate feedback on this issue. Please post to the forum thread, so that I have the information at hand when I finally can turn to optimizing that code area.

Some more on the rsyslog data race…

After I had written my writeup on the rsyslog data race, I had another very useful discussion with Lorenzo Catucci. We touched some points that are not yet included in the write-up. I also explained some of the rsyslog internals that you probably find no other source yet. In short: I think this discussion is useful for anyone interested in more details. I am not sure when (if) I will be able to include it into the original post. So rather than lose it, I asked Lorenzo if he agreed to making it public and thankfully he did.

So here it is – very authentic and totally unedited, with all the typos and sometimes non-sequence of a chat discussion as we all like it. Have fun! Comments are appreciated.

Rainer: writeup finally finished: http://blog.gerhards.net/2009/01/rsyslog-data-race-analysis.html ;)

 

catucci: very nice wrap-up.

Rainer: the really good thing is that it actually sound we found the bug :)

now lets hope for the best…

catucci: I think you should emphasise a little bit more the need for coherent handling of condition variables or the like: now I understand in what sense you talked about an existing memory barrier this morning; the problem with that “barrier” was that it wasn’t enabled on both sides… something like a water valve…

Rainer: good point, will see that I edit it in

but with the barrier, I actually meant the real thing: http://en.wikipedia.org/wiki/Memory_barrier
it forces the cpu to in-order memory operations
I thought it is sufficient.. but obviously not
I didn’t have the exact failure scenario on my mind this morning
that evolved during the analysis
and was clarified by the write-up
in fact I started a new lab in the midst of the writeup
which again proves the point that it pays to write things down

catucci: Yes, but what I was thinking was the constructor side missing the ordering.

Rainer: it’s even more complicated

I didn’t explain that all
rsyslog shall run great on uniprocessor

catucci: will wait for the next version of the write-up!!! .-)

Rainer: under some constraints, it is possible to run without any sync at all

catucci: :-)

Rainer: (I wont add this, it is too much in depth)

so the message mutex is enabled depending on whether it is needed or not
similarly, when the message is constructed, it is done in a single thread
the mutex is disabled while doing so
only after it is constructed, the object is finalized, which then (and only then) enables the mutex
so it is ok that part of the objhects lifetime it is not synchronized
maybe never, depending on the config
that gains quite some performance
even on MP, because of the construction phase
of course, it doesn’t ease troubleshooting…

catucci: Yes, as usual, I’ve been too fast in both thinking and writing… what I was talking about was the enqueueing and not the real building.

Rainer: yeah, that’s right

the message is first finaliuzed, then it is enqueued
from then on, the sequence is always the same (there is only a single call to do so)
well… is it.. good point
will check reality vs. theory tomorrow ;)
that’s why I love discussion :)

catucci: good, see you.

Rainer: cu – have a great evening!

Sent at 6:49 PM on Wednesday

rgerhards: good morning – what’s your lab doing? ;)

Sent at 9:47 AM on Thursday

catucci: still running well.

remember: with -d option active.

rgerhards: sounds good. mine was running well, too

yeah… verification will be a long road
at some point we should switch back to production code

catucci: Just another small note on your weblog entry:

rgerhards: be happy to hear it

catucci: you some to have missed a very important ingredient of my mix: a very high load on the system; sar -q gives me these figures:

rgerhards: oh yeah

I missed to write that down
should add it

catucci: 07:15:01 runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15
09:45:01 0 216 0.70 0.59 0.49

rgerhards: all in all, now that I (think) I know the failure scenario

I know that it is only probable under high load

catucci: this is importanto, since with big plists, it’s much more probable for the threads to be preempted by the scheduler, and to jump to other packages

rgerhards: and even then it is so unlikely to happen that you need an awful lot of iterations, what also requires high load

indeed

catucci: you know, I’m a bit old school, and I’m used to talk about packages for multi core…

rgerhards: so the ingredient is to have a fast machine, many cores, run them utilized, but in a way that as few as possible preemption of the rsyslog threads happen

do you concur to this POV?
Sent at 9:54 AM on Thursday

catucci: Almost completely: an high contect switch rate is needed even between the syslog threads IMHO: if the threads do just sleep undisturbed on waits/pools/selects, the probability of the race happening just go infinitesimal. to get to the race you (almost) need a forced context switch (I think)

rgerhards: now I agree mostly… I think the context switch itself is not necessary. But it is necessary that access to the reference counter happens concurrently.

catucci: I no loger own the elder two Pentium-III system that has been my workstation for the last ten years, but I’d bet you could just as well see the races there; maybe in a shorter time than on my mx-01

rgerhards: from my POV, this can happen either due to context switches OR due to two threads being executed on two different execution units right at the same time. Actually, I think the later easier leads to the failure scenario.

catucci: yes, that’s very true; the high context switch rate does more to probabilistically spread the syslog run between different packages

rgerhards: I think the key is to understand the scenario. I completely agree that now that we have details, we can craft lab environments that will reproduce the issue on not-so-powerful hardware. One needs to know the ingredients.

catucci: remember that multi-core packages usually have a shared cache between the cores in the same package

rgerhards: For example, I did one lab where I increased the concurrency of rsyslog actions. That lead to more context switches, more mutex calles and lead to no repro at all.

This was overdone. If I reduced the concurrency, I was again able to reproduce. For every machine, there seems to be a mix that probably triggerst the failure scenario. I think this was one of the reasons why it was so hard to find. Not knowing the root cause, the bug appeared very random and was extremely hard to track down to a config/environment.
BTW: do you mind if I post a copy of this conversation on the blog? I will probably not be able to update the analysis today, but our conversation should be very helpful to everyone interested.

catucci: In that lab the queuer and the dequeuer did share the same mutex opinion? The key ingredient seems to be using a two different syncronization primitives on the two sides

rgerhards: yes and now – in action execution there are two mutex types involved, each with multiple instances

catucci: No problem at all for publishing

rgerhards: publishing: thx

there is one mutex that guards each individual action. No action (output module) may be called concurrently – that’s part of the module spec (for good reason with many modules)
so there is one mutex for each action and it is aquired before an action is entered and released when it is done.
The other mutex class guards each individual message. that is the one described in the analysis.
if two different action invocations process a message, they may concurrently try to modify the message (that depends on the actual config, most importantly on the templates used and order of action execution)
So any such modifcation – AFTER the message has been finalized (think yesterday evening) is guarded by the message mutex.

catucci: right; I was thinking about the “unlocked” case in MsgAddRef’s# ifdef xHAVE_ATOMIC_BUILTINS case

rgerhards: if you have n actions, a single message may therotically in use n times

(side-note: if n is 1, it can not be used multiple times, and this is – I think – one of the cases where the message mutex is “turned off”)
I needed to set stage ;)
with that design, all actions use exactly the same sequence to increment and decrement the reference counter.
so this will always be used in a consistent manner. That part of the logic was absolutely right. BUT…. as I wrote, the msgAddRef used atomics, while msgDestruct used the message mutex.

catucci: Right: this is the problem (IMHO) the mutex and the atomics where not coherent.

rgerhards: so while the sequence of calls to msgAddRef and msgDestruct was right and consistent, the operations carried out by these calls were inconsistent

yes, exactly

catucci: there should be only one kind of sync primitive in use, be it atomics or mutexes

rgerhards: thus there are two fixes: either use the mutex exlusively (that’s running in your lab) or use the atomics exlusively (I’ve run that in my lab since around noon yesterday)

exactly!
and this is what the final fix will look like: it’ll check if atomics are avaiable and, if so, use them. If they are unavailable, we will use the mutex (which is much slower)

catucci: yes, this is what I think too. and I’m happy we have an experimental confirmation!

rgerhards: indeed

The problem is that we for now have just a indication, I wouldn’t say its a confirmation yet.

catucci: what branch should I download to test the atimics only?

rgerhards: But it is a pretty good indication.

catucci: you know… scince is made of falsifications…

rgerhards: I have no branch yet to test with. What runs in my lab is a heavily hacked version, quite ugly

indeed
my point is we do not have enough proof yet
but… we have a sound theory
plus we know for sure (as we can show a failure case) that this was a bug

catucci: right.

rgerhards: we do not know for sure if it was the only one. But we now it was one

so code has improved

catucci: right too.

rgerhards: and both of our lab results show stability has increased, at worst only in some scenarios

in that sense, however, theory and experiment go together
and this indeed makes me feel good :)
I’ll see that I get a new master branch together today that includes everything needed to address the failure scenario AND enabling atomics again.
I need to be careful as the atomics version is ugly and I need to verify I do everything needed in a clean version, too.
It would be excellent if you could then run that new master branch. And then we see what happens. I think a debug build would be good at first (as this was the repro environment) and once this works out well (a day or two without aborts should be sufficent), we can switch the production build.
In parall, I hope that others who experienced the problem will also try the new code.
so over the next couple of days, maybe weeks, we should get a good feeling if there still is an issue.
I have to admit it is quite complex code, so I wouldn’t outrule there is something else…

catucci: Your are right especially on the other’s systems. Maybe ew could try to get in touch with the other debian users experiencing the crash

rgerhards: definitely. There is also an entry in debian’s bug tracking system. I already did a patch for the version used by debian lenny and michael Biebl, the package maintainer, did an updated package this night.

so I think we should get some feedback from there, too. But the problem is that the problem occured very sporadic in that case. I’d say anything less than two weeks of undisturbed operation is not even an indication
I will also ask the folks on the rsyslog mailing list to apply patches .. as soon as new versions are out.

catucci: good. see you.