rsyslog: optimizing exception handling

The recent analysis of rsyslog’s race condition has fueled some related and some not-so-related discussions. Among them is an old-time favorite, that is performance enhancement. I have finally taken the time to write about rsyslog’s “exception handling” and what I do not like about it.

I am reproducing a forum post here, in the hopes that it will be easier to find – and attract more attention – if it is available via the blog. Comments I would appreciate via the forum, so that I can keep track of them in a single location. With that said, here we go:

In rsyslog, a kind of exception handling is done by the “iRet” mechanism. In short, there exists an integer data type that conveys a universal return code. This code ranges from “all OK” over “all OK, but this and that information”, “we had a warning” to “something went wrong”. States are encoded as integer numbers. By calling convention, almost all functions return such an iRet value (named after its variable name). More importantly, every caller checks the outcome and employs a kind of exception handling when something unexpected happened (like doing resource cleanup). As an aid to the developer, most of the inner workings are encapsulated in easy to use macros.

For example, the return code checking is done via the CHKiRet(f(x)) macro, which expands to something like

if((iRet = f(x)) != RS_RET_OK)
goto abort_finalize;

As such, the innocent-looking (and frequently found )sequence

CHKiRet(f(x));
CHKiRet(g(x));
CHKiRet(h(x));

results in lots of conditional branches. Such code places a big burden on a CPU’s speculative execution resources. For example, it may need a lot of space in the branch pattern table, ejecting other, potentially useful entries from the cache. Given the fact that the quality of speculative execution affects execution speed considerably on modern CPUs, pressing the speculative system to its max is probably not a wise idea.

One performance enhancement approach is to find ways that enable the code to be executed in larger linear blocks. The most important observation is that in almost all cases, the if() condition is never true, because typically the outcome of the function called is an OK state.

I thought about using longjmp to provide the necessary functionality, but the setup effort for longjmp, on *quick* lock, seems to be too high, especially in the case of the number of small functions that are present in rsyslog (and inlinening does not help with this issue). The answer is probably too look at how the C++ exception mechanism is implemented and build a solution similar to that (just like many of the object callbacks are inspired by the C++ method call tables).

I have not yet begun to dig seriously into this optimization, as there are plenty of other things that can be improved and that promise to have much more effect (like the reduction of the overall number of system calls needed on a per message basis).

However, I would appreciate feedback on this issue. Please post to the forum thread, so that I have the information at hand when I finally can turn to optimizing that code area.