rsyslog changes for 2007-08-09

Yesterday I made a couple of changes:

– fixed a bug in outchannel code that caused templates to be incorrectly
parsed
– fixed a bug in ommysql that caused a wrong “;template” missing message
in some cases
– some text and code cleanup in cflineParseTemplate() to match the changed
design of selector handling
– added new doc files to autotools for make dist
– fixed a bug that would have manifested when module configuration commands
would have been freed. The current code base, however, never does this, so
this is not a real bug fix – it’s in preparation of future work
– fixed a potential memory leak in modules.c, again, this could not happen in
current code
– added interface API for unloading module (a dummy)
– added module unload functionality; rsyslogd now unloads modules on exit
(of course, with only statically linked modules, there is little current
value in this – but it is made towards an upcoming dynaload plugin interface)

rsyslog changes for 2007-08-07 & 08

This time, I blog the changes for both days. Today was again quite busy, with a number of smaller changes. But all in all you can see that the pace of changes is reducing. I hope that people will test it, but I also hope that we will see no more big bugs the next few weeks. Then, I consider it to be mature for a new stable release.

You will probably see no or very few code changes during that period. You can expect, however, some doc upgrades. All in all, my focus has now moved to the next major release, which I am currently designing. And of course, some other things are also asking for a bit of time now that I have worked very intensely the past two month on rsyslog.

Stay tuned. Any news will appear here in the blog. In the mean time, try 1.18.1 and tell us what needs to be fixed (of course, you may also just praise our work, that’s fine, too ;)).

Here is the work log:

2007-08-07

  • applied patch from Michel Samia to fix compilation when the pthreads feature is disabled
  • moved code to create/delete message queue to init(), after reading the conf file. This is a prequisite to allow specifying the queue size in the conf file.
  • add config directive $MainMsgQueueSize, which now allows to configure the queue size dynamically

2007-08-08

  • moved the “after select polling code (for fds)” to its own function
  • optimized select handling, after select polling cycle is now finished when all active selectors are processed
  • all compile-time settings are now shown in rsyslogd -v, not just the active ones
  • changed function name dprintf() to dbgprintf() as it conflicts with the clib
  • added config file directive $ActionResumeInterval
  • solved an issue with debian sid. LARGEFILE preprocessor defines are changed in rsyslog.h, which causes grief for zlib under sid. As a temporary solution, I have moved rsyslog.h right at the beginnng of the include order. It’s somewhat dirty, but it works. I think the real solution will be inside the autoconf files.
  • added a database cleanup script based on a suggestion of Michael Mansour. Michael Meckelein adapted it slightly for use with rsyslog.
  • released 1.18.1
  • fixed bug: default for $DropMsgsWithMaliciousDnsPTRRecords was not reset on up and on $ResetConfigVariables

next generation rsyslog design, worker pools, parellelism and future hardware

Based on my last post on rsyslog multithreading, an email conversation arose. When I noticed I elaborated about a lot of things possibly of general interest, I decided to turn this into a blog post. Here it is.

The question that all this started was whether or not a worker thread pool is good or evil for a syslogd.

Yes, the worker-thread pool has a lot of pros and cons with syslog. I know this quite well, because I am also the main design lead behind WinSyslog, which was the first syslogd available natively on Windows (a commercial app). It is heavily multi-threaded. I opposed a worker pool for a long time but have accepted it lately. In some high-volume scenarios (unusual cases) it is quite valuable. But of course you lose order of messages. For WinSyslog, we made the compromise to have the worker pool configurable and set it to 1 worker if order of events is very important. I designed WinSyslog in 1995 and released the first version in 1996 – so I know quite well what I am talking about (but to be honest the initial multi threading engine got in somewhat later;)).

HOWEVER, especially in high-volume environments and with network senders, you are somewhat playing Russian roulette if you strongly believe that order in which events come in is exactly the same order in which they were sent. Just think routers, switches, congestions, etc… For small volumes, that’s fair enough. But once the volume goes up, you do not get it 100% right. This is one of the reasons I am working quite hard it the IETF to get a better timestamp in into syslog (rsyslog already has it, of course as an option). The right thing to do message sequencing is by looking at a high-precision timestamp, not by looking at time of reception.

For rsyslog, I am not yet sure if it will ever receive a worker pool. From today’s view, it does not look necessary. But if I think about future developments in hardware, the only key in getting more performance is by using the enhanced parallelism the hardware will provide. The time of fast single cores is over. We will have relatively slow (as fast as today ;-]) massively parallel hardware. This is a challenge for all of software engineering. Many folks have not yet realized it. I think it has more problem potential than the last “big software crisis”. As you can see in the sequencing discussion, parallelism doesn’t only mean mutexes and threads – it means you need to re-think on how you do things (e.g. use timestamps for correlation instead of reception/processing order, because the later is no stable concept in a massively parallel program). With the design I am now doing, I will definitely have separate threads for each output action (like file or database writer). I need this, because rsyslog will provide a way to queue messages on disk when a destination is not available. You could also call this “store-and-forward syslog”, just like SMTP is queued while in transit. You can not do this concept with a single thread (at least not in a reasonable complex way). I also do not think that multiple processes would be the right solution. First off, they are too slow. Secondly, multiple processes for this use are much more complicated than threads (I know, I’ve written similar multi-process based things back in the 80s and 90s).

Rsyslog will also most probably have more than one thread for the input. Input will become modular, too. If I use one thread per input, each input can use whatever threading model that it likes. That makes writing input plugins easy (one of my goals). Also, in high volume environments, having the ability to run inputs on multiple CPUs *is* *very* beneficial. One of the first plugins will be RFC 3195 syslog, which is currently run as an external process (communicating via unix local domain socket, which is quite a hack). At the core of 3195 is liblogging, a slim and fast select-server I have written. However, integrating it into a single-threaded application is still a challenge (you need to merge the select() calls and provide an API for that). With multiple threads, you can run that select server on its own thread, which is quite reasonable (after all, why should both UDP and 3195 reception run on the same thread?). The same is true for tcp based processing. Then think about the native ssl support that is coming. Without hardware crypto acceleration, doesn’t it make much sense to run the receiver on its own thread? Doesn’t it even make sense to run the sender on its own thread?

So the threading model of the next major release of rsyslog (called 3.x for reasons not yet elaborated about) will most probably be:

  1. multiple input threads, at least one for each input module (module may decide about more threads if they like to)
  2. all those input threads serialize messages to a single incoming queue
  3. the queue will most probably be processed by a single worker thread working on filter conditions and passing messages to the relevant outputs. There may be a worker pool, but if so its size can be configured (and set to 1, if needed)
  4. multiple output threads, at least one for each output. Again, it is the output’s decision if it runs on more than one thread
  5. there may be a number of housekeeping threads, e.g. for DNS cache maintenance

This design will provide superb performance, is oriented on logical needs, allows for easy intrgration of plugins AND will be reasonable easy to manage – at least I hope so.

But, wait, why have I written all this? OK, one reason is that I wanted to document the upcoming threading model for quite a while and now was a good time for doing so. But I think it also shows that you can not say “multithreading is good/bad” without thinking about the rest of the system design. Multi-threading is inherently complex. Humans think sequentially, at least when we think about something consciously. So parallel programming doesn’t match human nature. On the other hand, nature is doing things itself massively parallel. Just think about our human vision: in tech terms, it is massively parallel signal processing. This is where technology is also heading to. As we have learned to program at all, we will also learn to deal with parallel programming. A key thing is that we should never think about parallelism as a feature – it’s actually a tool, that can be used right or wrong. So let’s hope the rsyslog design will do it right. Of course, comments are very much appreciated!

why is rsyslog multi-threaded (and is it really?)

I found a quite interesting discussion on a debian mailing list on why it may be useful (or even evil) for rsyslog to be multi-threaded.

It’s late now, so I can not elaborate in full, but I thought I share a little feedback. First of all, my project description is actually cheating a bit at this time: rsyslog is currently actually dual-threaded, with one thread for message reception and one thread for message processing. And this is even optional, so you can turn it off with a configure option to make it single-threaded. Future versions of rsyslog will have much improved threading model and that has much to do with modular input and output – but that’s a different story.

Run rsyslog compiled for single threading and make it write to a MySQL database, then throw lots of messages at it (let’s say maybe 500 per second, which is not actually much). You’ll soon see that messages are lost, because the receiver can’t receive new messages while the MySQL output module (message processor) is writing messages. So reliability and message reception capability is limited by the slowest output module in use.

With a multi-threaded model, this problem does not exists. Receiver and output modules are de-coupled via an in-memory queue. So when the MySQL server is to slow for the current message rate, messages are buffered in the queue, but the receiver can continue receiving messages. As we have UDP, this is quite important. Please note that in a failover case, the output module may be busy for as long as maybe even a few seconds. This time may be needed to probe if the primary has really failed and then connecting to the secondary. The whole idea not only applies to MySQL (though it is most obvious in this use case) but also for TCP based syslog forwarding. Here, the receiver can actually slow down the sender, so the same problem may occurs. De-coupling receiver and output modules is also of great benefits in high volume environments. It enables bursts to be processed reliably where messages were otherwise lost.

So the main reason for the current threading model is reliability and striving to lose as few messages as possible. Of course, there is also a performance benefit, especially if you think that in the future we will see machines with many more cores – but a single core not beeing considerably faster than today. Rsyslog’s current threading model is not up for this challenge yet – that will happen later this year. Then, each output will receive its own thread. There are good reasons to do that, for example the need to support re-startable action with local disk queueing (aka “store-and-forward syslog). Without parallelism, this is not doable. But I’ll elaborate on that later as the object and threading model design grows.

And, yes, I share concerns that threading makes code more complex. This is the #1 reason that threading is still a compile-time optional feature in the current rsyslog. But as the project evolves, we need to have multiple threads to support a clean object model. And, yes, again, that sounds heavy. All I can say is that I am a performance hog, so you won’t probably see rsyslog taking up huge amounts of CPU just for the sake of having an object model. But of course, I may be cheating here, too (can you trust me? – stay tuned and you’ll see…).

I hope my short notes help to understand a bit why the current rsyslogd uses multiple threads.

EDIT: I have now found some time to elaborate on the upcoming rsyslog threading model and multi-threaded programming in general.

rsyslog changes for 2007-08-06

I have been on the road today, so not much happened. But at least some things were done ;)

  • varmojfekoj sent me a patch for some selector_t init code. It patched re-init, where rsyslog should happen to segfault (it didn’t do it, but why not is a mystery…) – applied it
  • changed action resume interval to a more reasonable 30-second intial value
  • moved action object out of syslogd.c to its own fileset (action.c/h)

Keep in mind that the pace of changes will now go down for a few weeks. The code needs to mature. I’ll probably focus more on docs AND *other* projects in the mean time. I guess I’ll also work on no-code things, like config file format and final object design (more on those later, this was just a teaser ;)).

syslog changes on 2007-08-03

here’s friday’s work log:

– added CODE_STD_FINALIZERparseSelectorAct to module-generation macros
– restructered rsyslogd startup – moved startWorker() to a more
appropriate
place
– updated ommysql.c to fully support suspension/resumption by rule
engine
– I found out that we finally have problems with the (somewhat
recursive)
call to logerror() that many of the modules do. I have not tried it,
but
I think things will become wild when we compile without pthread
support.
Threading prevents full recursion, so we have not seen any bad effects
so far. However, the problems that I experienced in ommysl (that
caused
me to re-structure startWorker()) are actually rooted in this issue.
I
first thought to fix it via a module interace, but I now came to the
conclusion that it is not more effort and much cleaner to do an
internal
error buffering class. This is implemented in errbuf.c/h.
– I just noticed that this is not actually an error buf, but the core of
an input module for all internal messages. As such, I implement it now
as iminternal.c/h. Of course, there is no input module interface yet
designed, but that doesn’t matter. Worst-case, I need to re-write the
im, best case I can use the im (at least partly) to define the
interface.
– added a few functions to the linkedlist class
– error messages during startup are now buffered – so we do no longer
need
to think about how emergency logging might work. Actually, these are
logged
to whatever is instatiated in the log file. This enhances the chance
that
we will be able to drop the error message somewhere it is seen.
– released 1.18.0
– reduced number of identical error messages in ommysql.c while
suspended

rsyslog changes for 2007-08-02

For those curios, here is todays work log as far as it relates to rsyslog:

  • enhanced linkedList class, new method to get count, new method to execute a user-supplied function on all members
  • enhanced syslogd memory structures to support multiple actions per selector however, this can not yet be configured due to missing config code for this case
  • moved cfsysline-calling into cfline() – now a unified interface again
  • enabled functionality to have more than one action per selector
  • changed doAction() syslogd internal functions to allow for larger data & state data – in preparation for actions that shall only be executed when previous action was suspended (the switchover case, e.g. for failed databases or TCP receivers)
  • added tryResume() API to module interface
  • added resumption logic to rsyslogd (but not yet any module)
  • got a working version of suspension/resumption logic including omfwd.c
  • implemented $ActionExecOnlyWhenPreviousIsSuspended config directive
  • MILESTONE REACHED: we now have the ability to switch over to a
    different syslog/tcp server if the primary one fails. It is also done via a generic interface, so it should be sufficiently simple to extend that interface to other actions, namely the database.
  • some cleanup
  • begun to work on ommysql.c

finally: ability to automatically switch to backup servers

I’ve achieved a long-term goal for rsyslog today: I have finally managed to include the ability to automatically switch to backup actions (servers, databases, whatever ;)) when the primary fails. I have implemented it in a very high level inside the syslogd rule engine, and so it is quite powerful.

Just read this doc excerpt:

ActionExecOnlyIfPreviousIsSuspended
This directive allows to specify if actions should always be executed (“off,” the default) or only if the previous action is suspended (“on”). This directive works hand-in-hand with the multiple actions per selector feature. It can be used, for example, to create rules that automatically switch destination servers or databases to a (set of) backup(s), if the primary server fails. Note that this feature depends on proper implementation of the suspend feature in the output module.
The following is an example of what can be done with it (a config file excerpt):

*.* @@primary-syslog.example.com
$ActionExecOnlyIfPreviousIsSuspended on
& @@secondary-1-syslog.example.com
& @@secondary-2-syslog.example.com
& /var/log/localbuffer
$ActionExecOnlyIfPreviousIsSuspended off

This selector processes all messages it receives (*.*). It tries to forward every message to primary-syslog.example.com (via tcp). If it can not reach that server, it tries secondary-1-syslog.example.com, if that fails too, it tries secondary-2-syslog.example.com. If neither of these servers can be connected, the data is stored in /var/log/localbuffer. Please note that the secondaries and the local log buffer are only used if the one before them does not work. So ideally, /var/log/localbuffer will never receive a message. If one of the servers resumes operation, it automatically takes over processing again.

As is written in the doc, this ability depends on proper handling of the new suspend-interface in the output driver. So far, I have only updated the omfwd driver. I plan to do the mysql driver soon. I’ll possibly also look after the file driver, but suspension there seems like a foreign concept. The other drivers don’t support it (nothing can go wrong enough), so I don’t bother about them.

All in all, I am quite happy and I think this is a very useful feature. It will be part of 1.18.0, currently it is “just” in anon cvs. I hope to release 1.18.0 tomorrow – let’s see if that works.

I am also glad that this example shows the output interface (and a whole lot of the other new internal plumbing) works pretty well. I think we are in an excellent position for all the other new things to come ;)

rsyslog changes for 2007-08-01

Today, I just post the work log. I think it speaks for itself ;)

  • loading default values as done yesterday *is* clean (I’ve just reviewed the code again)
  • moved omfile-specifc global variables (for cfsysline settings) from syslogd.c to omfile.c
  • MILESTONE reached: initial output modularization is completed. Of
    course, we will work on the interface some more, but all basic goals are reached.
    Note that there are also some accesses of global data and functions, but
    that is not a real design issue but something that can be solved by moving
    code to supporting libraries (which will occur later). Also note that
    output module availability does not indicated the presence of a plug-in
    interface. That will happen some time later when the output module interface has somewhat more matured (my personal bet is in around two month from now).
  • fixed an invalid value for the MARK timer – unfortunately, there was a testing aid left in place. This resulted in quite frequent MARK messages
  • some code cleanup (unused vars and such…)
  • enhanced cfsysline interface to disallow chaining of command handlers
  • added $include config directive
  • applied a patch from mildew to prevent rsyslogd from freezing under
    heavy load. This could happen when the queue was full. Now, we drop messages but rsyslogd remains active.
  • fixed bug in cfsysline that caused iRet to be lost during handler call
  • enhanced $Include to allow inclusion of a complete directory
  • changed name of $Include to $IncludeConfig – sound better as it is
    more explicit (we’ll see where else we get includes for…)
  • released 1.17.6
  • modified rklogd to only change the console log level if -c is specified
  • even more cleanup ;)
  • prepared code to move action-specifc selector_t data members to their own structure
  • moved action-describing data fields out of selector_t to its own structure; changed code to use this dynamically-allocated structure; this is in preparation for multiple actions per selector
  • moved definition of selector_t to syslogd.c again, as this now no longer is a global structure
  • shuffled code in cfline() to prepare for a better system of processing the selector_t linked list