After I had written my writeup on the rsyslog data race, I had another very useful discussion with Lorenzo Catucci. We touched some points that are not yet included in the write-up. I also explained some of the rsyslog internals that you probably find no other source yet. In short: I think this discussion is useful for anyone interested in more details. I am not sure when (if) I will be able to include it into the original post. So rather than lose it, I asked Lorenzo if he agreed to making it public and thankfully he did.
So here it is – very authentic and totally unedited, with all the typos and sometimes non-sequence of a chat discussion as we all like it. Have fun! Comments are appreciated.
Rainer: writeup finally finished: http://blog.gerhards.net/2009/01/rsyslog-data-race-analysis.html ;)
catucci: very nice wrap-up.
Rainer: the really good thing is that it actually sound we found the bug :)
catucci: I think you should emphasise a little bit more the need for coherent handling of condition variables or the like: now I understand in what sense you talked about an existing memory barrier this morning; the problem with that “barrier” was that it wasn’t enabled on both sides… something like a water valve…
Rainer: good point, will see that I edit it in
catucci: Yes, but what I was thinking was the constructor side missing the ordering.
Rainer: it’s even more complicated
catucci: will wait for the next version of the write-up!!! .-)
Rainer: under some constraints, it is possible to run without any sync at all
catucci: :-)
Rainer: (I wont add this, it is too much in depth)
catucci: Yes, as usual, I’ve been too fast in both thinking and writing… what I was talking about was the enqueueing and not the real building.
Rainer: yeah, that’s right
catucci: good, see you.
Rainer: cu – have a great evening!
rgerhards: good morning – what’s your lab doing? ;)
catucci: still running well.
rgerhards: sounds good. mine was running well, too
catucci: Just another small note on your weblog entry:
rgerhards: be happy to hear it
catucci: you some to have missed a very important ingredient of my mix: a very high load on the system; sar -q gives me these figures:
rgerhards: oh yeah
catucci: 07:15:01 runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15
09:45:01 0 216 0.70 0.59 0.49
rgerhards: all in all, now that I (think) I know the failure scenario
catucci: this is importanto, since with big plists, it’s much more probable for the threads to be preempted by the scheduler, and to jump to other packages
rgerhards: and even then it is so unlikely to happen that you need an awful lot of iterations, what also requires high load
catucci: you know, I’m a bit old school, and I’m used to talk about packages for multi core…
rgerhards: so the ingredient is to have a fast machine, many cores, run them utilized, but in a way that as few as possible preemption of the rsyslog threads happen
catucci: Almost completely: an high contect switch rate is needed even between the syslog threads IMHO: if the threads do just sleep undisturbed on waits/pools/selects, the probability of the race happening just go infinitesimal. to get to the race you (almost) need a forced context switch (I think)
rgerhards: now I agree mostly… I think the context switch itself is not necessary. But it is necessary that access to the reference counter happens concurrently.
catucci: I no loger own the elder two Pentium-III system that has been my workstation for the last ten years, but I’d bet you could just as well see the races there; maybe in a shorter time than on my mx-01
rgerhards: from my POV, this can happen either due to context switches OR due to two threads being executed on two different execution units right at the same time. Actually, I think the later easier leads to the failure scenario.
catucci: yes, that’s very true; the high context switch rate does more to probabilistically spread the syslog run between different packages
rgerhards: I think the key is to understand the scenario. I completely agree that now that we have details, we can craft lab environments that will reproduce the issue on not-so-powerful hardware. One needs to know the ingredients.
catucci: remember that multi-core packages usually have a shared cache between the cores in the same package
rgerhards: For example, I did one lab where I increased the concurrency of rsyslog actions. That lead to more context switches, more mutex calles and lead to no repro at all.
catucci: In that lab the queuer and the dequeuer did share the same mutex opinion? The key ingredient seems to be using a two different syncronization primitives on the two sides
rgerhards: yes and now – in action execution there are two mutex types involved, each with multiple instances
catucci: No problem at all for publishing
rgerhards: publishing: thx
catucci: right; I was thinking about the “unlocked” case in MsgAddRef’s# ifdef xHAVE_ATOMIC_BUILTINS case
rgerhards: if you have n actions, a single message may therotically in use n times
catucci: Right: this is the problem (IMHO) the mutex and the atomics where not coherent.
rgerhards: so while the sequence of calls to msgAddRef and msgDestruct was right and consistent, the operations carried out by these calls were inconsistent
catucci: there should be only one kind of sync primitive in use, be it atomics or mutexes
rgerhards: thus there are two fixes: either use the mutex exlusively (that’s running in your lab) or use the atomics exlusively (I’ve run that in my lab since around noon yesterday)
catucci: yes, this is what I think too. and I’m happy we have an experimental confirmation!
rgerhards: indeed
catucci: what branch should I download to test the atimics only?
rgerhards: But it is a pretty good indication.
catucci: you know… scince is made of falsifications…
rgerhards: I have no branch yet to test with. What runs in my lab is a heavily hacked version, quite ugly
catucci: right.
rgerhards: we do not know for sure if it was the only one. But we now it was one
catucci: right too.
rgerhards: and both of our lab results show stability has increased, at worst only in some scenarios
catucci: Your are right especially on the other’s systems. Maybe ew could try to get in touch with the other debian users experiencing the crash
rgerhards: definitely. There is also an entry in debian’s bug tracking system. I already did a patch for the version used by debian lenny and michael Biebl, the package maintainer, did an updated package this night.
catucci: good. see you.