Could I really reproduce the bug…?

Today, I was able to actually test and debug rsyslog Not just looking at code and how it may work. No, real interaction and real crashes.

Things went well, but then I got stuck. Somehow, the segfault didn’t make much sense. I found something that is related to the segfault user’s are seeing. But is it really the actual segfault or just a side-effect of instrumentation?

With mudflap active, rsyslog crashes when freeing the message structure in the worker thread. The structure was allocated (malloc) in another thread, actually the “main” thread, that is the one rsyslog starts up in. Of course, I’ve first assumed I have messed up with the structure. But further analysis showed that I have not. So a bad feeling creeped in … that there may be some thread safety issues with malloc/free. On the other hand, rsyslog is far from being my first multi-threaded program (but on a modern flavor of linux, I have to admit). I’ve used dynamic memory alloc in multithreaded apps for years now and without any problems. After all, dynamic memory is often a trouble-safer with multithreading.

Then, I have written a minimalistic program to check out threading functionality. Here it is:


#include <stdlib.h>
#include <pthread.h>

static char *pthrd;
static char *pmain;

static void *singleWorker1()
{
pthrd = malloc(32);
printf("alloc p in thread: %8.8xn", pthrd);
pthread_exit(0);
}

static void *singleWorker2()
{
printf("freeing p from thread: %8.8xn", pthrd);
free(pthrd);
printf("free done!n");
printf("freeing p from main thread: %8.8xn", pmain);
free(pmain);
printf("free done!n");
pthread_exit(0);
}

void main()
{
int i;
pthread_t thrdWorker;

i = pthread_create(&thrdWorker, NULL, singleWorker1, NULL);
pthread_join(thrdWorker, NULL);
pmain = malloc(32);
printf("alloc p in main thread: %8.8xn", pmain);
i = pthread_create(&thrdWorker, NULL, singleWorker2, NULL);
pthread_join(thrdWorker, NULL);
printf("main thread exitingn");
}

Note: the code did originally contain sleep(1) in stead of the pthread_join()s now found in it. I was initially too lazy to do it right in this tester. I’ve been told this is bad, so I fixed it. The result, however, is unchanged.

… and now look at the output:

cc -O1 -fmudflapth threadtest.c -lpthread -lmudflapth
threadtest.c: In function ‘main’:
threadtest.c:27: warning: return type of ‘main’ is not ‘int’
[root@localhost rsyslog]# ./a.out
malloc: using debugging hooks
alloc p in thread: 095586d0
alloc p in main thread: 095587f8
freeing p from thread: 095586d0
free done!
freeing p from main thread: 095587f8
*** glibc detected *** ./a.out: free(): invalid pointer: 0x095587f8 ***

free done!
main thread exiting
*******
mudflap stats:
calls to __mf_check: 0
__mf_register: 5179 [524294B, 32B, 20981024B, 0B, 2365B]
__mf_unregister: 0 [0B]
__mf_violation: [0, 0, 0, 0, 0]
calls with reentrancy: 5132
lock contention: 0
lookup cache slots used: 0 unused: 1024 peak-reuse: 0
number of live objects: 5179
zombie objects: 0

As you can see, the free that is done on the memory malloc’ed in the thread I created manually works fine. But the freeing the memory malloc’ed in the main thread fails miserably (I’ve set MALLOC_CHECK_=1, for the records).

I am both stunned and puzzled. If that is really a problem, it is clear why rsyslog aborts.

… but can that really be? I have to admit I now suspect a problem with mudflap — when it is compiled without it, everything works. But this applies only to the test program. Rsyslog doesn’t as quickly abort compiled without mudflap, but it aborts in any case. So can there really be a problem in the way dynamic memory management is done and in which threads?

If you can contribute to the solution, please do. I really need any helping hand, this is probably one of the most strange situations I’ve ever seen [and, of course, all will clear up once I see where I have failed – as always ;)].

Feedback appreciated!

finally … I can produce a segfault!

Thanks to mildew, I can now produce a rsyslog segfault. It even happens immediately at startup. Actually, it’s a bit quick, I’d expected it to occur much later. But it looks similar enough to the bug reports I received, so I can hunt that one down (yes, it seems to occur persistently)! Chances are pretty good that, if it is fixed, the real culprit is found.

Ah … and what was the magic? Mildew suggested to add mudflap to the project. I have to admit I was unaware of this capability. I guess, I’ll now make it a standard part of the debug environment – but let’s hunt the bug first…

on ommysql packaging

I thought I share an interesting conversation that went over the rsyslog mailing list. Even though it buried in the mailing list archive, I think here is a better place to be displayed. Feedback is much appreciated.


> —–Original Message—–
> From: Michael Biebl
> Sent: Friday, September 28, 2007 10:23 AM
> To: rsyslog-users; Rainer Gerhards
> Subject: Re: [rsyslog] rsyslog 1.19.8 released
>
> 2007/9/27, Rainer Gerhards :
> > repeat message processing. The MySQL functionality is now taken out
> of
> > the core package, but its tarball is still contained in the main
> tarball
> > (so it is still a single download for everything). This is part of
> the
> > effort to fully support third-party plugins. Rsyslog 1.19.8 is a
>
>
> To be honest, I don’t particularly like this change. It increases the
> work for package maintainers (like me). You now have to maintain two
> source packages. Having a non-standard tarball inside a tarball
> doesn’t help there. It even worsens things, as stuff like “make dist”
> or “make distcheck” doesn’t work anymore for a cvs checkout.
> There’s also the problem of which version of the plugins will be
> compatible with the core version (upgrade scenarios, keeping them in
> sync, etc.)
> It also adds complexity for the developer (as he has to maintain an
> additonal set of build files).
> As you were talking about 3rd party plugins (with the emphasis on 3rd
> party) I don’t understand the benefit of splitting out the mysql
> plugin as it is you who develops the mysql plugin, not a third party.
> Do you actually intend to create a separate tarball for each plugin in
> the future?
>
> What was wrong with the –enable-mysql configure switch? I don’t see
> any benefits, only disadvantages. You know, if it ain’t broken, why
> fix it ;-)
>
> Cheers,
> Michael

Hi Michael,

as I have blogged, I am not yet sure about how to handle the situation. I am also consulting with Autotools experts, any advise is appreciated.

Two packages seem useful, especially when more plugins become available (I myself think about email and a couple of other databases). Many folks also do not like the idea of having to have libmysql present on the system just to install rsyslog – with a core package and the plugin, those can only install the core (and that will probably the majority of cases).

What I have not yet found – and I have very limited expertise in this area – is how to do that in the best possible way.

Oh, and some more background: ones the plugin interace has matured (I expect this in 3 to 6 month), I intend to actually use ommysql with its own version number. Versioning will be handled by the interface (that part is already present, but no code for it yet as there is only a release 1 of it). So, in the medium to long term, ommysql *will* be a separate project – maybe one with a different maintainer, as I am no mysql guy.

Does this make sense? As I said, comments are much appreciated…

Rainer

work log for 2007-09-26

work log for rsyslog:

2007-09-26
– applied patch provided by varmojfekoj to support building ommysql
in its own way (now also resides in a plugin subdirectory)
– fixed a bug in cvthname() that lead to message loss if part
of the source hostname would have been dropped

Did not manage to release 1.19.8 today…

Sorry folks, I didn’t manage to release 1.19.8 today. There was simply too much other activity that required attention. So I couldn’t finally decide on how to distribute from now on. Anyhow, it is my firm intention to release 1.19.8 tomorrow.

The good news, though, is that some folks are already testing that code. Maybe I get some feedback until tomorrow morning – and maybe that even helps me to get a better release…

needed to pull 1.19.7 release

Unfortunately, there is a serious bug in rsyslog 1.19.7 that can prevent UDP message reception. Totally. No message received at all. This somehow slipped through our own testing as well as test at some third parties.

The root cause of this problem is interesting: I changed an internal interface to make things more reliable. What I changed had some old code in them that did return’s right in the middle of the code. I overlooked those returns and so an invalid state was returned.

The interesting fact is that the function now returns an enum type (rsRetVal). Previously, it returned an integer. For some reason, the compiler issued no warning when the old (accidentally remaining) code returned an integer. If I’d receive a “wrong type” warning, I’d probably spotted it before even doing testing at all.

Bottom line: what’s wrong with my compiler settings?

Oh, and yes: I’ll release the fixed version soon. It then will be 1.19.8 to avoid confusion. I now fight with the distribution system: I received a patch that allows ommysql to be build as a separate module. So now it is separate – even from a distribution point of view. That causes some grief for distributing it and can possibly break some things in distribution packages. I need to think how to tackle that in the best possible way…

worklog for 2007-09-25

2007-09-25
– changed ttyname() to ttyname_r() – not a real fix, as this part of the
code was single threaded, but better to be prepared for the future.
– changed localtime() to localtime_r()
– released 1.19.7
– applied contributed patch to improve repeated message processing
(see https://bugzilla.redhat.com/show_bug.cgi?id=303341)

rsyslog changes until 2007-09-24

Hi all,

I’ve been a bit lazy reporting what I’ve done with rsyslog. The primary reason is that it was a lot of review, which is quite boring to report in a blog. Today, I think, I’ll do a new release, so at least here is my worklog for up until yesterday.

2007-09-17
– possibly found a bug in cvthname() that lead to a wrong size being specified
in a getnameinfo() API call – not sure, though, if it is “the” bug (actually,
it looks like it isn’t). – this is EXPERIMENTAL
– fixed a bug that caused signal handlers in cvthname() not to be restored when
a malicious pointer record was detected and processing of the message been
stopped for that reason (this should be really rare and can not be related
to the segfault bug we are hunting).
2007-09-18
– split the function cvthname() for clarity. Also changed to using the rsRetVal
status return system
– removed some compiler warnings in regard to signed / unsigned comparison
2007-09-20
– code cleanup
– fixed a minor memory leak that occured when the %APPNAME% property was
used (I think nobody used that in practice)
2007-09-21
– more review and cleanup
– simplified code in shouldProcessThisMessage() for debug output
– changed strerror() calls to thread-safe strerror_r() variant

What’s going on with rsyslog?

I’ve not posted much the past days. That doesn’t mean nothing has happened. I wanted to post a work log today, but I have to admit I have forgotten it on a machine I right now can not access. OK, first thing tomorrow morning…

In short words, we are still on the bug hunt. I am now again back to reviewing code, this time on a functional basis. I am checking everything based on the message flow, looking at functions as they are called. Today, I completed the review of the reception part (up to the point when it gets into the main message queue). Unfortunately, no serious problem found. I used the review, however, to clean out some nits, add a large number of new comments and even found a memory leak. The later one would currently most probably never occur in practice, but when syslog-protocol gets adopted, it would have hit.

Tomorrow, and probably then next two or three days I’ll review the code that is executed once the message leaves to main queue. There is probably more meat for a bug in that part (its by far more complex).

I also think I’ll release the cleaned-up version sometime soon – after all, it’s better then what is currently released.

I keep you posted. Comments are always welcome.

paid services for rsyslog

As with a lot of open source projects, rsyslog funding is problematic. Of course, rsyslog currently is funded by my interest in it. And I am glad that Adiscon, my company, permits me to work on. The actual funding for all that action, kind of funny, mostly stems from closed source projects in the Windows world.

However, I’d like to see that over the years rsyslog can fund itself. In my point of view, funding should be provided by those that benefit most from it. So obviously, I do not expect any funding from private folks, people like you and me. Of course, everyone is invented to contribute and new code, doc and bug hunting is definitely a pro. On the other hand, companies (and other organizations, namely the government) take financial advantage of using it. Some of them also contribute in terms of time and code. This is great and much appreciated :)

For the others, I have begun to offer donations. Not surprisingly, this is not a real source of funding ;) But now I have taken an offer from sourceforge.net, which I think is interesting. They begin to offer a marketplace for open source services. So those folks that actually need help can be brought in contact with those that have experience. Sounds like a good idea to me and a fair way to fund projects.

I have created two test service offerings for rsyslog. One is to write a nice custom-created rsyslog configuration file. Well, with rsyslog’s relative ease of use, I do not expect that many folks use that. But you never know and I am always curios.

The other one is especially targeted towards organizations who must prove they have “official” support for all software they use (I guess this includes at least a number of government agencies). For them, I have created a email support option for rsyslog. It guarantees responses to support questions and this is often needed for auditing and other purposes. Of course, we still answer all support emails and do not plan to stop that.

I have added one real goody, though: and that is that we will provide patches for past releases of rsyslog. As you may know, developers hate to fumble with old releases. And so there is very little motivation to look at an older release when there is a new one out. With the paid support option, however, there is some motivation to do that. So, again, I think this is fair: we are just offering a service that would otherwise never appear. It doesn’t hurt any of the other users.

I am very interested to see how this works out. I would also be interested in feedback from the field. How do you like this idea? Do you have any other/additional suggestions?