A design problem…

Folks, I am facing a design problem – and it looks so simple that I am pulling out all my hair ;)

I am currently preparing the next steps in modular rsyslog redesign. I am not sure yet, there are a couple of candidates what to do first. One is to add a real expression capability, another one is to add threaded inputs (which would be quite useful). In support of these enhancements, a number of things need to be changed in the current code. Remember, we are still running on large parts of the original sysklogd code, which was never meant to do all these advanced things (plus, it is quite old and shows its age). A cleanup of the core, however, requires some knowledge of what shall be done with it in the future.

My trouble is about a small detail. A detail I thought that should be easy to solve by a little bit of web search or doing a forum post or two. But… not only did I find the relevant information, I did not even find an appropriate place to post. May be I am too dumb (well possible).

OK, enough said. Now what is the problem? I don’t know how to terminate a long-running “socket call” in a proper way under *nix. Remember, I have done most of my multithreading programming in the past ten years or so under Windows.

What I want to do: Rsyslog will support loadable input modules in the future. In essence, an input module is something that gets data from a data source (e.g. syslog/udp, syslog/tcp, kernel log, text file, whatever …), parses it and constructs a message object out of it and injects that message object into the processing queue. Each input module will run on its own thread. Fairly easy and well-understood. The problem happens when it comes to termination (or config reload). At that instant, I need to stop all of these input module threads in a graceful way. The problem is that they are probably still in a long-lasting read call. So how to handle this right?

Under Windows, I have the WSACancelBlockingCall() API. Whenever I call that method, all threads magically wake up and their read and write calls return an error state. Sweet. I know that I can use signal() under Linux to do much of the same. However, from what I read on the web I have the impression that this is not the right thing to do. First of all it seems to interfere with the pthreads library in a somewhat unexpected way and secondly there is only a very limited set of signals available … and none left for me?

The next approach would be to have each blocking call timeout after a relatively short period, e.g. 10 seconds. But that feels even worse to me. Performance wise, it looks bad. Design-wise it looks just plain ugly, much like a work-around. It looks like I needed to do something not knowing what the right thing is (which, as it turns out, is the right description at the time being ;)).

To make matters worse, I have a similar problem not only with the read and write calls but with other constructs as well. For example, I’d like to have a couple (well, one to get started) of background threads that handle periodic activity (mark messages immediately come to my mind). Again, I would need a way to awake them when it comes to termination time – immediately.

And, of course, I would prefer to have one mechanism to awake any sleeping thread. Granted, can’t do that under Windows either, so I may need to use different constructs, here, too.

This is the current state of affairs. There is still enough work to do before the question MUST be answered in order to proceed. But that point in time approaches quickly. I would deeply appreciate any help on this issue. Be it either advise on how to actually design that part of the code – or be it advised where to ask for a solution! Really – a big problem is that I did not find an appropriate place to ask. Either the forum is not deeply technical enough, or there are some mailing lists where the topic is on something really different. If you know where to ask- please tell me!

[update] In the mean time, I have found a place to ask. Blieve it or not, I had forgotten to check for a dedicated newsgroup. And, of course, there is ;) The discussion there is quite fruitful.

update on rsyslog changes

I have been a bit lazy in posting my recent rsyslog changes. This is because there was a lot of fine-tuning as well as design work. I also fear that I probably have forgotten to log a few less important changes. Nevertheless, the log should still provide you with a good idea of what’s going on. If I do not get any more last minute nits, the 1.20.0 release can hopefully be released either tomorrow or on Friday.

Here are the changes now:

2007-11-27
– got new dev environment on Fedora 8 x64 up and running
– reviewed calls to MsgDestruct() – re-confirmed that there are no
double frees.
– moved syslog PRI code tables to msg.c as this is the only place where
they are still needed

2007-12-03
– adding sur5r’s postgres module – many thanks for providing it!
– added Bartosz Kuzma patches for better compatibility with NetBSD – many
thanks for them!
– disabled GSS-API in configure.ac, as support for it is not yet complete
(Peter Vrabec told me to hold it until a further patch)
2007-12-04
– released 1.20.0 preview (via mailing list only, waiting on feedback)
2007-12-05
– applied patch from Bartosz Kuzma and Michael Biebl to fix compilation
problem on NetBSD (it is missing the -ldl lib, even though it provides
dlopen()).

rsyslog now does postgres SQL!

I have just finished the integration of a new postgres sql output module for rsyslog. The code was provided by sur5r. I am more than glad that rsyslog now supports the popular postgres sql engine. I am sure this is a very nice addition for the community.

Sur5r’s able-coded code also provided some feedback about the output module interface in general. The good news is that the amount of code needed in the actual output module was as low as initially thought.

But it also proved that module design still is less than perfect: there was a new date format needed to support postgres sql. This could not be done in the output module. Instead, sur5r had to modify the core files. He did this in an excellent way, but the point is that it is not a good thing this was needed at all. It is a result of the old, non-modular syslogd design which we inherited from sysklogd and have not yet changed. In fact, we are at the very beginning of modularization. The next thing on the list is a loadable function interface. With that, such modifications will no longer be necessary, a custom (loadable) function can than be provided.

This enhancement requires a number of changes in the core, most importantly in the template subsystem. This is most probably the next area of work I am heading to.

But for the time being, let’s celebrate that rsyslog now has postgres sql support! It will be released in 1.20.0, hopefully around the end of the week. A preview will be announced on the mailing list somewhat earlier. And if you can’t wait, simply pull it out of anonymous CVS – it’s in there as of this writing.

rsyslog now part of Fedora 8

Oops… I think I forgot to mention that rsyslog was recently included in Fedora 8. This is an important milestone for the rsyslog project, because it is the first time ever that it appears as the default syslogd in a wide-spread distribution.

Thankfully, a lot of other package maintainers have been found for other distros (I really appreciate your hard work, guys!), but everywhere else rsyslog is “just” an extra. In Fedora 8, it is the default logging daemon. I’d also like to add that Red Hat folks contributed quite a lot to rsyslog, including great testing.

Thanks to everyone who helped make rsyslog this capable. I think its appearance in Fedora 8 is rewarding for all of us!

rsyslog 1.19.12 released

I just released rsyslog 1.19.12. For the minor change I made in 1.19.11, I created a branch based on 1.19.10. I thought .11 would be the last release in that branch. Obviously, I was wrong. Working on two branches is definitely a lot of not really useful extra work. I’ll try to get thing together this week, then again only working with one branch. That will be the feature-enhanced 1.20.0 release.

Did I find the rsyslog bug?

I have a new leading theory on the rsyslog segfault bug. Before I restructured everything to get rid of the alarm() calls, I did some more research in respect to the best threading model. More or less by accident, I found a nice note on glibc, rentrancy and the _REENTRANT preprocessor macro. That lead me to the “-pthread” compiler option…

May it be that we “just” have a compiler option problem? So far, we only call it with “-lpthreads”, which only effects the linker. -pthread, if I understood it right, defines _REENTRANT, which in turn causes some reentrant versions of standard library functions to be used. If that isn’t outdated information, it could pretty much be our problem.

I am not sure which functions of the run time library are affected by the _REENTRANT macro. So I do not know if I call them. However, non-reentrant run time library functions would be a good explanation for very seldom segfaults that only occur under heavy load and when running in multithreading mode. It would also explain why so many in-depth code reviews did not find anything…

This compiler option finding looks very promising. I’ll probably do a special 1.19.11 release with just that patch and then we’ll see what happens. In the mean time, I wait if somebody comes up with some more diagnostic information. But I finally have found a good explanation for what I see – at least in theory…

Is alarm() the culprit?

I have once again reviewed the threading. Now I have a weak hope. To handle mark messages, there is an alarm() call inside rsyslog. While alarm() typically does not play well with pthreads, rsyslog uses it in a very limited scope. Also, the alarm handler is activated only in one specific thread. But… “alarm() does not play well with pthreads”… In absence of any better solution, may this be the actual cause for this hard to hunt bug?

I will now see what is required to get rid of the alarm(). Actually, this is harder than it initially looks. For alarm() to go away, I need to set up a real background thread that does the timer ticks and that causes the mark messages. That, in turn, means that I have two concurrent messages sources, which involves quite a bit of synchronization. All of that is currently not needed as the alarm signal simply interrupts the select, which in turn leads to execution of nice sequential code. Oh, and yes: the alarm signal handler of course does (contrary to sysklogd) nothing then to set a global variable flag. So I’ll have a look at all that…

found another part of the code … that is not the problem…

I’d like to reproduce part of my conversation with Peter Vrabec over here. Together with others, Peter is very actively helping track down the nasty rsyslog segfault bug. He had quite a good idea of what could have caused it, resulting in me doing another focussed code inspection. I just explained to him why we have not yet found the problem. And there is a lot of detail in that description. Detail, I think, that others can benefit from, too. So here it comes:

> *** glibc detected *** rsyslogd: double free or corruption (!prev):
> 0x09cbd588
> ***
> (gdb) print (uchar *) 0x09cbd588
> $49 = (unsigned char *) 0x9cbd588 “”
>
> is it possible to call free on pszRcvFrom when it points at “” ?
> because I can see:
> msg.c:
> if(pM->pszRcvFrom == NULL)
> return “”;
> else
> return (char*) pM->pszRcvFrom;
>
> same with getRawMsg/getUxTradMsg/…, places where mudflap
> screams in MsgDestruct().

Sorry – if I have not totally screwed up, it’s impossible that this is the problem. I’ve done (yet another) thorough review. All of these functions are either called to supply another Set…() function (which simply copies over the “”, but does not free it) or from MsgGetProp(). MsgGetProp() uses a variable (pbMustBeFreed) to track whether or not a buffer must be freed. In general, unmodified properties are never freed – this happens only at message destruction (MsgDestruct()). MsgDestruct() however, just checks the pointers and frees if they are non-NULL. So MsgDestruct() is not interested in getRawMsg() or another get() function returning an empty string except a NULL pointer.

I have also checked if MsgGetProp() and its helpers correctly handle pbMustBeFreed – and they do it right. So a buffer is only freed when it was dynamically allocated. Message properties are only created during message construction and are free()ed when the message is destroyed.

So I guess we have found another part of the code that does not cause the problem.

What I would like to re-iterate, though, is that the segfault problem seems to disappear under all circumstances if compiled with –disable-pthreads. IF SO (and this is my current assumption), it can not be a general logic error like a double free but must be a synchronization problem. Let me re-iterate: I have yet to find a single installation that has a segfault when running on a single thread! All folks with segfaults that compiled in single threading mode did no longer experience any problems. This is strong evidence to me.

But while I almost think I know it is threading related, I can not find anything wrong in that area, too. The threading model is very simple and construction/destruction of elements is spread in an easy way over the two threads. Basically, the listener thread creates the message and its properties, while the action thread destructs them. There are a few exceptions where properties are derived, but nothing really complex. Yet, I still think it is in the threading area – why does it work in single thread mode otherwise?

Hunting for the segfault…

Do you remember? We are still hunting for a segfault in rsyslog that is very hard to find. The biggest problem is that most people will never experience it. I do not experience it in lab, nor does anybody else who is currently working on the project. And without the ability to reproduce it, there is a lot of guesswork involved.

This is why we are asking for the help of our users (that means: you!). If you run rsyslog and experience a segfault, we would very much appreciate if you could run a specially instrumented version. Peter Vrabec created it and it contains debugging support as well mudflap support, which is a tool to track down nasty memory management errors.

The RPM can be found at:

http://people.redhat.com/pvrabec/rpms/rsyslog/rsyslog-1.19.10-2.mudflap.src.rpm

Please install it. You will notice that CPU usage is higher than normal, but in most cases this is harmless. If you are concerned, drop me a line with specifics and I’ll happily address them.

If a segfault happens when you run this version, please send us

  • binary,
  • coredump and
  • /var/log/rsyslog.mudflap
  • hardware and OS information (which distro? multiprocessor? …?)

Our sincere hope is that we receive enough reports to find something in common between them. So, please contribute your segfault info if you happen to have it. This is a great way to contribute to the project!

recent rsyslog changes

I am back to my routine of posting rsyslog changes. You may also imply that this means I am actually developing some things (and not just writing about it ;)). After I had a somewhat slow start today, things evolved quite nicely this afternoon. If I did not overlook anything important, I even managed to complete the “clean unload process” for loadable modules. That also brought me back to good working knowledge of the code. Actually, I am at least a day ahead of my schedule. But, of course, I’ll check if I overlooked something – but that’ll be tomorrow.

So on to the promised change log (it also covers some past days where I had not reported):

2007-11-19
– applied gssapi patch from varmojfekoj – gss-api is now supported
– added some debug message to ommysql
2007-11-20
– added user doc for gssapi patch from varmojfekoj – thanks!
– bumped version number to 1.20.0, because of new gss api functionality
2007-11-21
– begun to look at dynamic module unloading – this is currently a hack
and works with the mysql module only (which is the only one, so there
is no problem in practice. But it would be good to begin to do it right ;)
– added new modExit() entry point to loadable module interface
– added an identifier to command handler table – need to identify which
command handler entries need to be removed when module is unloaded
– added support so that linkedlist key can be used for owner handle
– enhanced llExecFunc to support deletion of list elements (on behalf of
user function being called, slight interface change)
– enhanced linkedlist class so that list elements can now be deleted based
on the key value they have
– created entry point so that CfSysLine handlers are removed on modExit()
– some cleanup
– modules are now correctly unloaded and de-initialized