being back…

I am now back in Germany for a week. I’ve now managed to get through all these nice “gifts” that were waiting for me while I was in Florida. Today, I think, I’ll be able to resume work on rsyslog. First on my todo list will be solving a problem with ommysql (or, better said, providing some instrumentation to get hold of it) as well as integrating a very interesting patch sent over from the red hat folks. So … stay tuned and expect updates to this blog to happen once again.

Getting off for launch viewing

A note to all fellow rsyslog users and contributors. I got a chance to view space shuttle Discovery’s STS-120 launch next week. I will fly over to Florida tomorrow. I will be available via email, but my response time will be sluggish. My buddy Michael will handle things related to rsyslog in the mean time. Most importantly, he’ll release 1.19.10 tomorrow while I am on the plane. Please keep thoughts and contributions flowing, they are very welcome.

recent changes

A couple of changes the last days. They are all now in cvs. Tomorrow we will release 1.19.10.

2007-10-15
– bumped version number
– changed ommsyql to allow for “:ommysql:” module specific action call method
instead of “>”. This shall facilitate the creation of other plugins.
– added notes about new action module designator to module documentation
– updated user doc to new module action calling convention
2007-10-16
– updated rsyslogd doc set man page; now in html format
2007-10-17
– undid creation of a separate thread for the main loop — this did not
turn out to be needed or useful, so reduce complexity once again.
– added doc fixes provided by Michael Biebl – thanks

rsyslog 1.19.9 released

I have just released rsyslog 1.19.9. It is now two weeks since the last release. I have taken some extra time to make sure that the release system (source tarball) now fits the packagers’ and user’s needs. Also, some time went into hunting the segfault, though this was still quite fruitless. However, mildew has identified that the segfault seems to occur only on 64 bit machines, which is a very good hint.

I’ll now watch for comments on the 1.19.9 release and then see if I myself can do anything against the segfault early next week (I am setting up another x64 machine for testing). Later the week, I’ll be heading for my space shuttle launch viewing trip, which will unfortunately mean that my focus will not be on rsyslog (by my co-workers will keep it up).

worklog for the past days…

Work on rsyslog was focussed on the new packaging and bug fixing. Most importantly, people had problems with the new tarball introduced in 1.19.8 and we have made sure that 1.19.9 will again meet the expectations. We’ve done a round of pre-releases on the mailing list and managed to get down to something that seems to be quite acceptable. The official 1.19.9 release will most probably be released later today.

Here are the details:

2007-10-08
– changed the threading to include one extra thread that runs the
mainloop. This was done to work around a problem with malloc/free. Note
that we are still running on two threads – the startup thread just waits
for the new one to finish. For a description of the problem I try to
work-around, please see:
http://rgerhards.blogspot.com/2007/10/could-i-really-reproduce-bug.html
Note that this is an experimental change, which will only stay if it
proves to fix the segfault issue we are dealing with.
2007-10-10
– changed the build system to use a single source tarball again (but
different makefiles for the main project and ommysql)
thanks to varmojfekoj for the patch
2007-10-11
– applied fixes from Michael Biebl:
1.) fix failing compilation of ommysql plugin (s/ommysql-config.h/config.h/)
2.) fix mysql configure check (although the default is no, we did
check for the mysql devel files)
3.) Create a separate Makefile.am for the doc files. This cleans up
the toplevel Makefile.am considerably and makes it much more readable
and maintainable.
3b) Assign the html doc files to html_DATA. This means, they are
installed to $(hmtdir), which by autoconf standards is
$(prefix)/share/doc/$packagename/.
4.) Reformat the SOURCES line to make it better readable and maintainable.
– integrated patch by varmojfekoj to tweak build system even further
– Peter Vrabec requested doc not to be “stored” in html_DATA – changed that

malloc/free anomaly cleared

Peter Vrabec provided very helpful information on the anomaly I experienced with malloc/free under mudflap instrumentation. See his report:


$ gcc -lmudflapth -lpthread -fmudflapth mud.c
.........----------
mud.c: In function ‘main’:
mud.c:27: warning: return type of ‘main’ is not ‘int’
./a.out
alloc p in thread: 0ea72530
alloc p in main thread: 0ea72460
freeing p from thread: 0ea72530
free done!
freeing p from main thread: 0ea72460
free done!
main thread exiting

gcc -lpthread -fmudflapth mud.c -lmudflapth
................................----------
mud.c: In function ‘main’:
mud.c:27: warning: return type of ‘main’ is not ‘int’
$ ./a.out
alloc p in thread: 1bffe3f0
alloc p in main thread: 1bffe440
freeing p from thread: 1bffe3f0
free done!
freeing p from main thread: 1bffe440
*** glibc detected *** ./a.out: double free or corruption (out):
0x000000001bffe440 ***
======= Backtrace: =========
/lib64/libc.so.6[0x32bde6e8a0]
/lib64/libc.so.6(cfree+0x8c)[0x32bde71fbc]
./a.out(__wrap_main+0x174)[0x400924]
/lib64/libpthread.so.0[0x32bea061b5]
/lib64/libc.so.6(clone+0x6d)[0x32bdecd39d]
======= Memory map: ========
bla bla bla

Note the position of the -lmudflapth argument. So, as it looks, the problem was really one of the instrumentation and not of rsyslogd itself. So, bad as it is, we are still back to hunting a bug that is hiding well. But hopefully we’ll get somewhat closer when mudflap is now actually active… I’ll see and post news as soon as I have them.

work-around for malloc/free issue created

I have now created a work-around for the malloc/free threading issue which creates an extra thread for the mainloop. The startup thread then simply waits for it to be finished. As such, there is never memory allocated in the “main” thread. At least for me, it works now with mudflap. I am still in doubt if that was the segfault issue (or just a bug in mudflap), but at least we can give it a try.

I will now see that I get some feedback. The next thing is change the packaging back to a single source tarball (by popular request ;)).

Could I really reproduce the bug…?

Today, I was able to actually test and debug rsyslog Not just looking at code and how it may work. No, real interaction and real crashes.

Things went well, but then I got stuck. Somehow, the segfault didn’t make much sense. I found something that is related to the segfault user’s are seeing. But is it really the actual segfault or just a side-effect of instrumentation?

With mudflap active, rsyslog crashes when freeing the message structure in the worker thread. The structure was allocated (malloc) in another thread, actually the “main” thread, that is the one rsyslog starts up in. Of course, I’ve first assumed I have messed up with the structure. But further analysis showed that I have not. So a bad feeling creeped in … that there may be some thread safety issues with malloc/free. On the other hand, rsyslog is far from being my first multi-threaded program (but on a modern flavor of linux, I have to admit). I’ve used dynamic memory alloc in multithreaded apps for years now and without any problems. After all, dynamic memory is often a trouble-safer with multithreading.

Then, I have written a minimalistic program to check out threading functionality. Here it is:


#include <stdlib.h>
#include <pthread.h>

static char *pthrd;
static char *pmain;

static void *singleWorker1()
{
pthrd = malloc(32);
printf("alloc p in thread: %8.8xn", pthrd);
pthread_exit(0);
}

static void *singleWorker2()
{
printf("freeing p from thread: %8.8xn", pthrd);
free(pthrd);
printf("free done!n");
printf("freeing p from main thread: %8.8xn", pmain);
free(pmain);
printf("free done!n");
pthread_exit(0);
}

void main()
{
int i;
pthread_t thrdWorker;

i = pthread_create(&thrdWorker, NULL, singleWorker1, NULL);
pthread_join(thrdWorker, NULL);
pmain = malloc(32);
printf("alloc p in main thread: %8.8xn", pmain);
i = pthread_create(&thrdWorker, NULL, singleWorker2, NULL);
pthread_join(thrdWorker, NULL);
printf("main thread exitingn");
}

Note: the code did originally contain sleep(1) in stead of the pthread_join()s now found in it. I was initially too lazy to do it right in this tester. I’ve been told this is bad, so I fixed it. The result, however, is unchanged.

… and now look at the output:

cc -O1 -fmudflapth threadtest.c -lpthread -lmudflapth
threadtest.c: In function ‘main’:
threadtest.c:27: warning: return type of ‘main’ is not ‘int’
[root@localhost rsyslog]# ./a.out
malloc: using debugging hooks
alloc p in thread: 095586d0
alloc p in main thread: 095587f8
freeing p from thread: 095586d0
free done!
freeing p from main thread: 095587f8
*** glibc detected *** ./a.out: free(): invalid pointer: 0x095587f8 ***

free done!
main thread exiting
*******
mudflap stats:
calls to __mf_check: 0
__mf_register: 5179 [524294B, 32B, 20981024B, 0B, 2365B]
__mf_unregister: 0 [0B]
__mf_violation: [0, 0, 0, 0, 0]
calls with reentrancy: 5132
lock contention: 0
lookup cache slots used: 0 unused: 1024 peak-reuse: 0
number of live objects: 5179
zombie objects: 0

As you can see, the free that is done on the memory malloc’ed in the thread I created manually works fine. But the freeing the memory malloc’ed in the main thread fails miserably (I’ve set MALLOC_CHECK_=1, for the records).

I am both stunned and puzzled. If that is really a problem, it is clear why rsyslog aborts.

… but can that really be? I have to admit I now suspect a problem with mudflap — when it is compiled without it, everything works. But this applies only to the test program. Rsyslog doesn’t as quickly abort compiled without mudflap, but it aborts in any case. So can there really be a problem in the way dynamic memory management is done and in which threads?

If you can contribute to the solution, please do. I really need any helping hand, this is probably one of the most strange situations I’ve ever seen [and, of course, all will clear up once I see where I have failed – as always ;)].

Feedback appreciated!

finally … I can produce a segfault!

Thanks to mildew, I can now produce a rsyslog segfault. It even happens immediately at startup. Actually, it’s a bit quick, I’d expected it to occur much later. But it looks similar enough to the bug reports I received, so I can hunt that one down (yes, it seems to occur persistently)! Chances are pretty good that, if it is fixed, the real culprit is found.

Ah … and what was the magic? Mildew suggested to add mudflap to the project. I have to admit I was unaware of this capability. I guess, I’ll now make it a standard part of the debug environment – but let’s hunt the bug first…

on ommysql packaging

I thought I share an interesting conversation that went over the rsyslog mailing list. Even though it buried in the mailing list archive, I think here is a better place to be displayed. Feedback is much appreciated.


> —–Original Message—–
> From: Michael Biebl
> Sent: Friday, September 28, 2007 10:23 AM
> To: rsyslog-users; Rainer Gerhards
> Subject: Re: [rsyslog] rsyslog 1.19.8 released
>
> 2007/9/27, Rainer Gerhards :
> > repeat message processing. The MySQL functionality is now taken out
> of
> > the core package, but its tarball is still contained in the main
> tarball
> > (so it is still a single download for everything). This is part of
> the
> > effort to fully support third-party plugins. Rsyslog 1.19.8 is a
>
>
> To be honest, I don’t particularly like this change. It increases the
> work for package maintainers (like me). You now have to maintain two
> source packages. Having a non-standard tarball inside a tarball
> doesn’t help there. It even worsens things, as stuff like “make dist”
> or “make distcheck” doesn’t work anymore for a cvs checkout.
> There’s also the problem of which version of the plugins will be
> compatible with the core version (upgrade scenarios, keeping them in
> sync, etc.)
> It also adds complexity for the developer (as he has to maintain an
> additonal set of build files).
> As you were talking about 3rd party plugins (with the emphasis on 3rd
> party) I don’t understand the benefit of splitting out the mysql
> plugin as it is you who develops the mysql plugin, not a third party.
> Do you actually intend to create a separate tarball for each plugin in
> the future?
>
> What was wrong with the –enable-mysql configure switch? I don’t see
> any benefits, only disadvantages. You know, if it ain’t broken, why
> fix it ;-)
>
> Cheers,
> Michael

Hi Michael,

as I have blogged, I am not yet sure about how to handle the situation. I am also consulting with Autotools experts, any advise is appreciated.

Two packages seem useful, especially when more plugins become available (I myself think about email and a couple of other databases). Many folks also do not like the idea of having to have libmysql present on the system just to install rsyslog – with a core package and the plugin, those can only install the core (and that will probably the majority of cases).

What I have not yet found – and I have very limited expertise in this area – is how to do that in the best possible way.

Oh, and some more background: ones the plugin interace has matured (I expect this in 3 to 6 month), I intend to actually use ommysql with its own version number. Versioning will be handled by the interface (that part is already present, but no code for it yet as there is only a release 1 of it). So, in the medium to long term, ommysql *will* be a separate project – maybe one with a different maintainer, as I am no mysql guy.

Does this make sense? As I said, comments are much appreciated…

Rainer