A good blog post on syslog reliability

I found a good blog post describing the problems that go along with reliable logging. It also offers most of the options to resolve them:

http://mschuette.name/wp/2008/03/31/thoughts-on-reliable-syslog/

The author missed one thing: a buffer must not necessarily exist only in main memory. When the in-core buffer runs out of space, you may also use a disk-based buffer, which offers much more capacity. Of course, even the largest disk-based buffer may be exhausted at some point, where one needs to resort to other strategies. But a disk-based buffer is an excellent solution for temporary (but lengthy) receiver outages.

Rsyslog has implemented everything mentioned in the paper, plus more. You can directly apply the knowledge you got from this paper to rsyslog. And then, you can dig down into the dirty details:

http://www.rsyslog.com/doc-queues.html

rsyslog work log 12

Yesterday’s rsyslog work log:
2008-04-17
– some cleanup
– created a global data object
– moved “family” variable to global data pool
– moved “bDropMalPTRMsgs” variable to global data pool
– moved “option_DisallowWarning” variable to global data pool
– moved “DisableDNS” variable to global data pool
– moved host/domain-name related variables to global data pool
– moved “glblModPath” variable inside global data pool (but
still as a variable, not part of glbl object)
– added the ability to specify an error log function for the
runtime
– removed dependency of core runtime on dirty.h
– imported tcp module from librelp as basis for new stream class
we got permission to include the tcp module from librelp
copyright holders
– done some forward-compatibility work on librelp
– brought netstrm to a (hopefully) somewhat usable state
– partly rewritten and improved omfwd
– some (small) cleanup of omgssapi
– optimized omfwed, now loads TCP code only if this is actually necessary

rsyslog work log 13

Yesterday’s rsyslog work log:
2008-04-16
– more or less finished im3195, but need changes in liblogging
to complete this work – does not compile yet
– moved files to runtime library part
– some cleanup
– provided ability to initialize the runtime
– some more cleanup; reduced dependencies, moved non-runtime
files to its own directory except for some whom’s status
is unclear
– completed im3195 including some documentation
– changes due to restructuring in 3.17.2 have big bug potential;
beta 3.15.x has almost no bug potential; thus I initiated a
shift of devel -> beta -> v3-stable; devel will restart at 3.19.0
– prevented segfault during runtime library init phase

rsyslog work log 14

Yesterday’s rsyslog work log:
2008-04-15
– added imklog doc
– begin LGPL change for a select set of files (core runtime)
– merged in bsd-port and klogd changes
– released 3.17.1
– worked on rsyslog runtime library
– worked a bit on phplogCon
– worked on liblogging 0.7.0
– begun re-integrating rfc3195 in rsyslog

on the rsyslog runtime

I had a conversation on the new runtime design for rsyslog. I think it had a quite some good technical information (and brief), so I reproduce it here:

> The new design is:
> rsyslog core (GPL)
> rsyslog runtime (LGPL)
> modules (whatever)

Yes, actually I intended to say that ;)

>
> what is the interface between
> rsyslog core/syslog-ng runtime? Pipe/linked
> runtime module? Pipe/linked
>
The runtime always needs to be linked. A few cases:

rsyslog runtime –> always linked
syslog-ng/whatever runtime… linked
runtime module -> linked

BUT the interesting case is:

rsyslog plugin -> linked or pipe
syslog-ng/whatever -> linked or pipe

In these cases, it depends on build parameters (all of this of course
not yet implemented). I would anticipate these combinations to be found
in practice:

rsyslog plugin -> linked [pipe for new functionality on old engine]
syslog-ng/whatever -> pipe

So with rsyslog plugin, that would just a fallback if you need it for
some reason (e.g. run a v3 engine with a v4 plugin WITHOUT the need to
backport it [requires a v3 and v4 runtime to be present on the system,
though].

For non-rsyslog syslogd’s I expect that pipe is always used, because I
do not think they’ll change their environment to adopt to the runtime.

rsyslog will provide wrappers for either interface. They will come as
separate binaries. There will be an input and output plugin to allow any
process (not necessarily rsyslog technology) to utilize a standard unix
pipe interface for producing and consuming messages.

Library modules will never use the pipe interface – it’s too slow and
too loosely coupled.

linking offers much greater performance and is lossless.
pipe is comparatively slow and may lose some messages.

Pipe is obviously a much easier and thus universal interface than the
plugin interface.

rsyslog work log 15

Past day’s rsyslog work log:
2008-04-10
– bugfix: omsnmp had a too-small sized buffer for hostname+port. This
could not lead to a segfault, as snprintf() was used, but could cause
some trouble with extensively long hostnames.
– removed dependency on MAXHOSTNAMELEN as much as it made sense.
GNU/Hurd does not define it (because it has no limit), and we have taken
care for cases where it is undefined now. However, some very few places
remain where IMHO it currently is not worth fixing the code. If it is
not defined, we have used a generous value of 1K, which is above IETF
RFC’s on hostname length at all. The memory consumption is no issue, as
there are only a handful of this buffers allocated *per run* — that’s
also the main reason why we consider it not worth to be fixed any further.
– worked on tls support (as part of libsci)
– thought about modularization
2008-04-11
– wrapped up modularization problem –> suggested rsyslog-runtime
– some cleanup
– enhanced legacy syslog parser to handle slightly malformed messages
(with a space in front of the timestamp) – at least HP procurve is
known to do that and I won’t outrule that others also do it. The
change looks quite unintrusive and so we added it to the parser.
2008-04-12
– implemented high precision timestamps for the kernel log. Thanks to
Michael Biebl for pointing out that the kernel log did not have them.
2008-04-14
– provided ability to discard non-kernel messages if they are present
in the kernel log (seems to happen on BSD)
– cleanup of imklog
– implemented $KLogInternalMsgFacility config directive
– implemented $KLogPermitNonKernelFacility config directive

TLS, loosely coupled modules, a runtime and licensing…

Again, I reproduce a mailing list post in the hopes to reach the broadest audience and also keep a permanent record here in my rsyslog history.

Hi folks,

I am sorry, this will be a long mail. But I would appreciate if you’d
read it in full and comment on it. This mail covers a really important
decision for rsyslog and will probably even influence if the project
succeeds in the long term. Package maintainers and code contributors are
especially requested to *really* read it. Though I t try hard to provide
all relevant facts (that’s why it is getting long), I will probably miss
some and not properly convey others. Please feel free to ask.

Let me start with explaining that the rsyslog project conceptually
consists of three parts:

– the modules
– “helper” functions
– rsyslog-specific functionality

Modules are actually projects in their own right, just being distributed
with the rsyslog tarball for convenience. A module may be released under
any license. Note that modules call both rsyslog-specific functionality
(e.g. to submit a message) as well as helper function (e.g. to handle
tcp sessions).

The “helper” functions are a growing set of generic objects. Examples
are the module loader, the queue engine, networking support, the script
engine and virtual machine, … – you get the idea: Things that are used
inside rsyslog but are not necessarily of use only for rsyslog.
Actually, this could be called a “rsyslog runtime library”.

Rsyslog-specific functionality is primarily rsyslogd and everything it
takes to glue together helpers and plugins to build the working syslog
subsystem.

Let’s stick with that for a brief while. Now let me explain the idea of
loosely coupled modules. This stems back to JF’s effort to convince me
to the Unix philosophy of “small tools that work well together”. We had
another good discussion yesterday (on the blog) and it made me change my
mind a bit (well, probably, not 100% convinced yet, but it he managed to
seed the thought ;)). While I still think that there are things that
need to be really tightly coupled to the rsyslog core, there are others
which not necessarily need to. Let me call the later “loosely coupled
modules”, in contrast to the (tightly coupled) plugins that actually
become part of the rsyslogd process during runtime. The analysis plugins
I have on my mind could become such loosely coupled modules. As an
interface, the usual Unix “send it and forget it” pipe could be used,
and it would probably be acceptable to allow for minor message loss
during shutdown and plugin failure (anything else would require a pipe
application protocol, e.g. relp over pipe, which sounds scary).

The plus in doing so would be the ability to use those plugins in
configurations where rsyslog is not present (e.g. driven by syslog-ng or
a detached message generator [fed from a log file]). Done right, one
could even select the (sligly lossy) pipe interface or the full blown
plugin interface as a compile time switch. If you think it out, we may
even end with an abstraction layer where each module can be compiled for
either the plugin interface or the pipe (no promises, though).

One problem with this approach is that modules call into the rsyslog
helpers. For example, rsyslog’s network support need to be available for
all those modules that do something over the net. That’s not a problem
if I have a tightly coupled plugin as today (the rsyslog core makes the
necessary bindings). It would become more problematic if I move the
module to a pipe interface, because I now need to find a way to use the
rsyslog objects. But that’s still doable (though pretty ugly). It
becomes really problematic if the same module, using a pipe interface,
is to be used with e.g. syslog-ng. I don’t think that syslog-ng will be
able to provide it with an emulated rsyslog “net” object.

Let’s stick with this problem for a moment. Coincidentally, we had
another discussion on the mailing list yesterday – on the TLS support
wrapper for rsyslog and librelp. That discussion centered around
licenses. Technically, there are also a number of issues. I have now
involved myself enough with GnuTLS and a bit of NSS so that I am able to
try draft a first abstraction layer. I thought hard and the “right
solution” involves encapsulation stream network access. So the right
thing to do is to have one object that handles network streams. That
object then is configured to use either plain tcp, TLS (via whatever
library) or even GSS-API. Nice and clean. It gets dirty if I think about
the details. If I do it that way, it makes rsyslog depend on this object
(so-far codenamed libsci). However, that would mean that any rsyslog
installation would need to pull in libsci. Not a big deal, except,
right, except if the crypto libraries are also pulled in by libsci. So
would every desktop system running rsyslog need to have the crypto
libraries installed? Scary… unacceptable.

In rsyslog, we had the same problem a few month ago (at that time the
mysql client libraries were the problem). The solution was the rsyslog
loader, which dynamically loads other libraries (and their dependencies)
on demand. The loader is what enables rsyslogd to be installed
everywhere, but only with minimal core requirements (and have everything
else in separate packages). So if libsci would be part of rsyslog, we
would not have any problem at all. After all, the necessary plumbing is
ready at hand in form of the rsyslog helper objects.

This is where we come back to loosely coupled modules. You notice it is
the same problem? Both them as well as libsci would need to call the
rsyslog helpers.

Now let’s come a bit to licensing. In order to understand that, we need
to talk about rsyslog funding first. Obviously, I am spending full time
(and a bit more than that) on rsyslog for quite a while now. I even
intend to do that for some more months as rsyslog is currently mabye 55%
of what I would like it to be. Somehow I must get funding – for the
time, for the hardware and for all the other things ;) What made the
rsyslog project possible, and still 99.9% funds it is Adiscon, the
company that provides (of course ;)) the best-ever logging solutions on
Windows. Actually the Windows closed source pays for the rsyslog
project. While we hope to find other sources of funding in the future, I
can not ignore the fact. Once thing we would like to do at Adiscon is
include select parts of the technology I am now developing into the
closed source applications, too. The most prominent example is the RELP
protocol. I obviously find this a fair policy – after all, the
alternative would be to do it in closed source only and I was able to
convince my folks at Adiscon that it is far better to contribute to the
open source world.

There is one drawback in this requirement: licensing. Of course, we
could pick a BSD style license and every problem would be solved. But, I
have to admit, we do not like to give everything to our competitors in
the *closed source* space. We have made very bad experiences with folks
building on our technology and even turning it against us. I won’t get
agreement from Adiscon to use a BSD license for everything (plus I
personally, too, don’t like to see that effect).

We already discussed this on the mailing list here as part of
dual-licensing in the past. The solution was that the technology in
question was created as its own, dual-licensed, project. This lead to
the creation of librelp. Rsyslog itself was left under GPLv3 (which I
sincerely believe in because of its anti-patent, anti-drm clauses – even
though the license gives myself obviously some troubles).

Dual-licensing librelp lead to some duplicate code and made me not use
some features which I could have used if I had access to all rsyslog
helper objects. For librelp, that is not yet a big deal, because it is
quite unique, very few needs to access the rsyslog helpers. With TLS,
however, the situation changes and we get the dangling issue of rsyslog
helpers in librelp, too.

LETS TRY TO WRAP-UP

If I put all of this together, I think I have taken a (slightly? ;))
wrong path. The core problem is monolithic design from a very high point
of view. I have to admit I think this is what JF and some others were
pointing out, but I didn’t realize it quickly enough. Sure, rsyslog is
quite modular by now. But rsyslog always requires rsyslog to do
everything. It is very hard to do any rsyslog-related work without the
rsyslog core. While rsyslog has a carefully crafted set of helper
objects, these are not exposed to the outside world. And the licensing
issues associated with that design begin to screw up everything in the
long run.

I think we need change. The obvious solution seems to be extracting the
rsyslog helpers out of the rsyslog core project and create a “rsyslog
runtime”. That runtime than could individually be installed and be put
under a different license (bear with me, explanation follows below).

Let’s consider a complicated case with the runtime. Assume we have a
plugin “NeverBeforeSeenAnalysis”. Let’s say someone wants to use it with
syslog-ng (!). With the runtime, all needed would be to compile it for
the pipe interface and install rsyslog-runtime and the module onto the
system.

Now let’s consider Adiscon’s MonitorWare products on Windows. When they
implement RELP, they need librelp and can pull in the rsyslog-runtime
(for network access including TLS).

For rsyslogd itself nothing really happens, the runtime is now just its
own library – linking to it needs not to be modified. So for rsyslogd,
the change would be transparent.

Technically, this indeed solves the issues. Let me stress the point that
it leads to code reuse, where I currently need to rewrite things (which
increasingly concerns me, especially from a maintenance point of view).

Now, on to the licensing. Obviously, the MonitorWare use case above
would be totally incompatible with GPLv3. So the rsyslog-runtime would
need to be under a different license. It could be dual-licensed, but I
think that would probably do more bad than good. I think I can convince
Adiscon to go with LGPL for the runtime part. Granted, it introduces
risk of closed source competitors pulling it in, but the advantages
should outweigh this risk.

>From the ability to put this work under a different license, I think I
am in good shape: most of the helper objects are freshly written and
have only received limited patches (if at all) from contributors. I can
contact them and ask for permission to change the license. Where I don’t
get permission, I think I can re-implement the contribution. Again, most
of the code in question has been written in the past 4 month and is
99.999% non-contributed. There may be some few runtime objects which
stem back to sysklogd. There, a license change is impractical. I’ll have
to life with the fact that those can not go into a re-licensed runtime.
Depending on how important the functionality is, I either need to
rewrite or drop it (for non-rsyslogd use). In any case, this looks
(pending detail analysis) quite possible.

Big question number one is what you think of this runtime approach? Have
I overlooked something? Do you object it for some reason? If so, which?

The next question is how to package this inside the source tree.
Remember, currently rsyslog and the plugins (considered separate
projects) are all packed together inside a single tarball. This is very
convenient, both for me as well as for package maintainers and users.
The question is if we split rsyslog into the rsyslogd and the
rsyslog-runtime, will we continue to deliver the runtime as part of the
rsyslog package? Or would it be better to move it to a librsyslog
project? Other than with the plugins, we actually would have two
different licenses, so it may be confusing to have both of it in the
same project (but I have seen that GnuTLS uses exactly this approach,
with the main library being LGPL and the – included – extras library GPL
only).

So that’s the next question, obviously depending on the first: how to
pack projects if we do a runtime split?

I know this is a long and dense mail. My apologies for this. But I think
the discussion is needed. I honestly believe that a number of
discussions in the past weeks actually circled around this theme, we
just didn’t actually get down to the point.

Please note that I will hold TLS development until we have reached
consensus on the runtime/licensing topic. The reason is if we don’t do a
runtime split, I need to do things considerable different than when we
do one (much more code, probably yet another external library). So,
obviously, I have a current bias towards the split. However, experience
shows that I (as everyone ;)) tend to overlook or misunderstand things.
Thus your feedback is so important. I don’t like the idea of jiggling
back and forth on such an important topic as licensing and high-level
modularization, so I would like to get it now done in “the right way”
and keep it stable for at least the foreseeable future. Given the fact
that the decision somehow affects rsyslog’s development as whole, I
would even appreciate quick feedback.

In this spirit: please let the comments flow ;)

Thanks,
Rainer

phpLogCon – why the next version

I am in an interesting email discussion and would like to share something on phpLogCon that’s probably of interest for others, too:

A major reason for phpLogCon v2 is enhanced functionality. We’ll switch away from a pure database paradigm. We’ll be able to work with log files (much faster and sufficient in many cases), of course databases but in the long term also with a specialized not-yet-written logging-specific datastore. We also want to build a community around the new phpLogCon and as an important step set up a public troubleshooting database and of course connect it to exisiting ones on the web. So the core idea of phpLogCon v2 is to create a system where users can analyze their logs but at the same time collaborate on finding solutions to issues they see (in the long term we may even get to a point where we can identify problems based on patters – but that is far too far away ;)). So the phpLogCon idea has gotten a bit broader (I should probably post that on the side, too).

Of course, rsyslog will also contribute to this vision – the overall idea is to create a great system monitoring and auditing system that not only helps with compliance but enables you to fix any upcoming trouble before it really hurts.

rsyslog work log 16

Yesterday’s rsyslog work log:
2008-04-09
– begun to work a bit on BSD portability issues
– changed imklog to a driver interface
imklog now uses os-specific drivers. The initial “set” contains
the linux driver. This is a prequisite for BSD klog, which can
now be implemented on that driver interface.
– improved detection of modules being loaded more than once
thanks to varmojfekoj for the patch (v3-stable)
– released 3.14.2

Why is native email capability an advantage for a syslogd?

Following up on my post on rsyslog’s new native email capability, an interesting conversation arose. I’d like to share it with you:

> > I promise to listen very carefully and try to implement anything that is
> > doable and makes sense in the rsyslog context.
> >
> One thing springs to mind – I think “sendmail” support is more important
> than you give it credit.
>
> What if you’ve got an alert rule in rsyslog to email you when your
> network link fails – but your SMTP server is at the other end of the
> link? :-) If you used sendmail – you get requeuing and retrying for free
> – I don’t think you want to have to add that to your SMTP support…

Well, that’s actually not an issue at all in rsyslog. The rsyslog core
engine is reliable [to be precise: can be configured to be reliable,
it’s not by default] in a way that exactly handles this situation. In
rsyslog, any action, including now mail, can run on its own queue. When
an action fails, it tells the rsyslog core that it could not
successfully complete. Then, the rsyslog core schedules retries until it
finally succeeds. While doing so, the messages are kept inside a queue.
This queue is in memory as long as that’s sufficient and is moved to
disk if there is demand (e.g. rsyslog shutdown, running out of
configured in-memory queue space). A sample of such a configuration
(this time with the database writer), can be found at:

http://www.rsyslog.com/doc-rsyslog_high_database_rate.html

Bottom line – rsyslog is designed to work with failing destinations and
automatically recover these. So there is nothing special needed to make
it handle a failing smtp connection.

In fact, I consider the SMTP direct mode more reliable than the sendmail
mode, exactly because of that feature. With sendmail, I hand over the
message to an external entity but do not know if delivery succeeded.
With SMTP direct, I know at least it made its way to the SMTP sever.
Granted, I don’t know if the SMTP server will ultimately deliver it, but
I have a bit more control over what’s going on.

For example: rsyslog also has a mode where it can use backup actions if
things fail (after n retries). So let’s consider the example above.
Let’s say we have an urgent alert, but the smtp server is down. With
sendmail, I hand the message over to sendmail but do not know that
sendmail actually queues it. With smtp direct, I *know* that the smtp
server is unresponsive. Depending on the urgency, I may either do a few
retries or I may immediately switch to another delivery method. For
example, I may than go to try SNMP. Or I may do another email action in
this case and try to contact a email-to-sms gateway so that this can be
delivered.

Please note that in rsyslog one can have multiple actions chained
together. So a probable scenario to handle such a case could be

1. try to email via the corporate server
2. if that fails, try to email via a public gateway
3. if that fails, start a program to do some automagic action

All of this is possible because of I do not use sendmail. But, again, I
of course do not know if the mail server I used with rsyslog succeeds in
its delivery attempt. One weak spot always remains ;)

To use yesterday’s sample, one could use a backup SMTP server with just
a little bit of configuration as follows:

$ModLoad ommail
$template mailSubject,”disk problem on %hostname%”
$template mailBody,”RSYSLOG Alertrnmsg=’%msg%'”

# primary action
$ActionMailSMTPServer mail.example.net
$ActionMailFrom rsyslog@example.net
$ActionMailTo operator@example.net
$ActionMailSubject mailSubject
# make sure we receive a mail only once in six
# hours (21,600 seconds ;))
$ActionExecOnlyOnceEveryInterval 21600
# the if … then … mailBody mus be on one line!
if $msg contains ‘hard disk fatal failure’ then :ommail:;mailBody

# begin backup action, carried out if primary fails
$ActionExecOnlyWhenPreviousIsSuspended on
$ActionMailSMTPServer mail2.example.net
$ActionMailFrom rsyslog@example.net
$ActionMailTo operator@example.net
$ActionMailSubject mailSubject
$ActionExecOnlyOnceEveryInterval 21600
& :ommail:;mailBody