rsyslog v4

Finally, it is time to think about the next major rsyslog release! I have done many enhancements in v3, but the latest performance optimization work leads to a couple of significant changes in the core engine. I think it makes sense to roll these into a new major release. That leaves folks with the option to keep at the feature-rich v3-stable branch, while avoiding some of the potential unavoidable bugs in the upcoming v4 branch.

From a feature point of few, version 3 would have been good for at least three to four major releases, which I did not do just because to prevent you from coming scared by the pace with which we are moving ;)- so I think it now is a perfect spot to begin developing v4. I hope that we will see a first beta of that branch around xmas, which, I think, is a nice gift.

syslog appliance website online

I have now set up a first basic web site for SyslogAppliance. It is not great yet, but it provides a stable reference point for any work that comes up. So people can hopefully begin to use this site as a pointer for useful resources.

As a side-note, you may notice I am using a .de (German) domain. Thanks to the spammers, com, org and net domains are already used by spamming sites. And I thought it does not matter if we use a de domain. After all, we live in a time where domains from the Cocos Islands (cc) or Tuvalu (tv) are being abused for generic purposes, so why not use .de for a generic site, too?

Oh, and one interesting find: at least one person actually downloaded and tried the version of SyslogAppliance I uploaded yesterday. How do I know? I had forgotten to include the phpLogCon user in the README ;) [of course, this is fixed now ;)]

virtual syslog appliance

I’ve just recently blogged about my syslog appliance idea. Now this has become reality. There is the first 0.0.1 version of rsyslog and phplogcon as a virtual appliance.

For starters, I have created a very simple system. While I have a number of options for the operating platform, I started to Ubuntu JeOS mainly because it had good guides for getting started with an appliance quickly. Being based based on Debian also was a plus. Some may argue that the downside is that the log appliance currently requires VMWare. While I agree this may be an issue, it is not an extremely big one especially as VMWare server runs quite well under Linux and is free to use.

I will investigate Red Hat’s AOS, but I think I need to get some results from the app point of view first and JeOS looks quite promising in this regard.

For now, I have even started with the stock rsyslog package, which is quite outdated on that platform. However, I’ll do a couple of iterations in the next days and so will come up to the current release soon. But for what the appliance currently needs, the older version is not really a problem.

I am now very interested in feedback on this new offering. The appliance can be downloaded from

http://www.rsyslog.com/Downloads-req-viewdownloaddetails-lid-136.phtml

One of my next actions is to set up a dedicated site, which will make finding (and providing!) information on the appliance much easier. But one thing after the other…

Oh, and one thing on the licensing: the appliance is free for non-commercial use. However, we intend to request a moderate fee for commercial use, which I think is a fair policy. Of course, all appliance components are freely available.

If you try out the appliance, please provide feedback!. I have set up a dedicated forum at

syslog appliance forum

As I said, the initial version will probably not as “plug and play” as I hope, but I am very positive we are on a good path. Besides, it is an exciting project.

A German rsyslog forum…

Some of you may know that I am a native German speaker. I thought I started an interesting experiment: the “deutsches rsyslog forum” (which means “german-language rsyslog forum” ;)). It is targeted to those who prefer to express themselves in German language.

The interesting question, though, is if this forum will actually attract much attention. In German IT, there is a tendency to think that almost everyone speaks sufficiently well English so that he or she can obtain enough information to get the job done. If that proves true, there would be very little benefit in localizing any of the documentation into German language. So before seriously considering that, it is probably a good idea to do some testing. For the very same reason, my buddy Tom Bergfeld currently translates the rsyslog home page, mainly the announcements, into German. We will do a similar experiment to phpLogCon and evaluate both together after some time has progressed.

Please drop me a note if you have an opinion on this, or on localization at all.

Thanks,
Rainer

a logging appliance

The IT world is increasingly turning towards appliances: pre-configured systems, which do exactly one job an do it well. Like a household appliance, all you need to do is plug it into your infrastructure, maybe change a setting or two and you are ready to go. While previously appliances always had a co-notation of a hardware box being delivered, the increasing popularity of virtualization enables to build pure software appliances.

One of the things we intend to investigate is create a logging appliance, using VMWare tools. We will set up a standard Linux, use MySQL and Apache with it and install rsyslog and phpLogCon. That in a “ready to use” fashion where only the devices need to be pointed to the right IP address of this virtual box.

This is one of my next projects and feedback on such an effort is very much appreciated.

rsyslog work

I have not blogged that much the past weeks. Rsyslog work is still progressing quite nicely. I am currently working on (large) performance enhancements. Thanks to David Lang for his help on this topic.

I am also hunting another threading bug. This one manifests only when running on high-end hardware and seems to have to do with synchronization. Interestingly, the problem does not show up under valgrind (a memory and threading debugger), which usually points to flaws very well. Hard to find… Especially hard to find because I do not have the right hardware to reproduce it. I managed to get a faster box last week, but still it is not fast enough (and does not have enough cores…) to reliably trigger the problem. Since them, I have seen it a couple of times, but no indication yet of what is going on.

Also, I need to help some other Adiscon projects and do a bit of my consulting chores, so that the time allocated to rsyslog is a bit limited. But, hey, I have worked nearly full time on it this year and it has evolved very well. So I don’t think it is a problem going at a somewhat lower pace for the time being (plus, of course, I hope some other sources of funding will appear which enable me to go back to “full time rsyslog” mode).

In any case, the future has exciting things to come up with. I personally would like to see a customizable message parser, which would enable users to work with different sender formats at the same time.

rsyslog and SuSe Linux

I got good news but I unfortunately forgot to blog about it (it has been soooo busy the past weeks…).

Suse now is doing an official rsyslog package. With that, I think there is now a rsyslog package for almost all major distributions available. Unfortunately, some of the current distributions do not have a good one, but it the next version will have. So over time things will be much better.

For now, let’s welcome Novell/SuSe Linux to the camp of those that natively support rsyslog :)

syslog fragmentation

Hi all, I was asked about the current (and past) state of syslog splitting a single syslog message into multiple smaller messages. The question came up because of a broken syslogd implementation which permits to receive 256 bytes (octets) at most. I thought I re-post my reply to the blog as it may be of interest for you, too. Here we go:


Hi all,

I have now reviewed all of the discussion.

Let me start with the broken receiver. With 255 octets, any (generic)
fragmentation method would need to be ultra-compact which of course is
doable but not (with reasonable overhead) in the context we set up with
the version of syslog-protocol that will turn into a normative RFC.
Note, also, that RFC3164 will be superseded by that RFC once it is out.

As has been described here, fragmentation can either be done at the
protocol layer or at the application layer. In the later case, the
application needs to consider a sensible maximum and needs to emit
message sequences which are somewhat atomic. sendmail seems to do that,
and I know of some other examples, too. Some database servers seem to do
verbose logging in a similar way, in that they log parts of the
statement within different log messages. However, it is often quite
complicated to re-unite those application logs to the original message
(much of the complexity of log analysis stems from that).

A protocol based approach solves these issues. But as it has been
rejected by the syslog-sec WG it is not considered useful by the IETF
syslog community. We may want to try give it another shot, but that
should be done inside the framework layed out in the new IETF series and
as such it would not be a good solution for a broken receiver. Actually,
the new RFC series requires a minimum maximum length of 480 octets
(stemming back to IPv4 UDP available payload size). The recommended
minimum maximum length is 2K and more is permitted if sender and
receiver support it. There is no upper limit per se, but a receiver may
either truncate the message or even discard it as whole. If truncation
happens, it must truncate at the end and without paying any attention to
syntax and semantics of the message. This is specified in [1]’s section
6.1 and it was the result of very elaborate discussions. Most
importantly, the syntax- and semantic-agnostic truncation was a
requirement out of this discussion.

As Tina mentioned, my company Adiscon and me personally are doing
Windows event log to syslog conversion for quite a while. Windows event
messages can be large and keep growing larger. We are converting them to
syslog for over 10 years now and at the time we started the only common
ground was the 1K limit already mentioned. Note that at that time some
implementations could experience serious malfunction if messages over
this size arrived (I remember the Solaris syslogd immediately
segfaulting as one sample of more). We thought about how to best address
the issue. We were tempted to do an app-level split, much as sendmail
does, but refused to do that for two reasons:

1) the messages to be logged did not originate from ourselves (other
than in the sendmail case). This implies that we do not exactly know
what makes sense to put together. While there is a potentially large set
where this can be properly concluded from context, there is also a set
where this is not the case. The later would have required a more
protocol-like generic approach and thus specialised parsers on the end
systems – something we did not really like.

2) this is somewhat similar to the parser problem. In general, log
analysis is even harder if a single logical log entry is distributed
over several physical records. Especially if you take into account that
the order of appearance does not necessarily (in practice almost never)
reflect the order of creation. So processing such a log requires a
consolidation phase. It is especially hard for a human reviewer do this
while reviewing logs and thus considered a big disadvantage.

So we looked at what was available at that time. While the 1K size limit
was universally accepted, most syslog receivers either supported larger
sizes by default, could configured to do so or being recompiled to
handle it (sysklogd, the then-omnipresent syslogd on Linux is a premier
example for the later – #define MAXLINE 1024 needed to be changed and
you were basically done [within UDP constranints]). The real limit
turned out to be the UDP max size, 64K in theory but with different
default/hard coded limits in various stacks. In 2005 a did a bit
research[2] and found that 4K seems to be the typical real-world limit.
But even many years before, allmost every Windows event record fit into
4K (the exception being records with dumps in them…). Also, we already
had plain TCP-based syslog at that time, which did not experience any
size issues.

So the practical solution for our Windows to syslog size problem was
simply to ignore it and tell customers how to configure/recompile their
syslogd. That worked on Windows at least for WinSyslog and Kiwi Syslog.
and on the *nix side at least for sysklogd, syslog-ng, some variants I
don’t remember by name and now rsyslog. We recommended to switch to a
product that supported larger sizes where the stock solution did not do
that. Or we used interim, specialised, receivers who logged data into
separate databases or files.

When I was unable to convince the syslog-sec WG to specify fragmentation
as part of the syslog protocol itself, I was at least able to put that
spirit into the I-D: so we now have the ability to use large sizes if
everybody configures the systems correctly. Part of that spirit, funny
as it may sound, is to place important information early in the packet
as this improves chances it will actually be delivered.

I have to admit this is not a perfect way to do it, but at least it
works if everything is setup up correctly. The current main “problem” is
that RFC 3195 (somewhat vaguely) sets an upper limit of 1K for messages
and also does not talk about truncation. So if there is a RFC3195 system
inside a relay chain, the maximum size for the whole chain goes down to
1K – there is nothing we can do about this. This is also the reason why
a new revision of 3195 is needed. This is underway, as far as I know.
One should also note that this limitation is of no practical importance
for the time being (thus no real “problem”), because 3195 did not find
widespread support. To the best of my knowledge, the only commercially
available implementations are Cisco’s and ours with us also providing
the only (more or less, due to low priority) fully supported 3195
implementation inside an open source syslogd. There was SDSC syslog, but
the project is to the best of my knowledge no longer alive. It also
never spread to become the default syslogd on any important Linux
distribution and can be considered “exotic” at best.

I hope this description is useful for you. The bottom line is that there
is no standard, and there is, at least was, no support for specifying
one. Even if we change that, the end-result will most probably not
support down-level reveivers below the 480 octet limit set forth in the
upcoming RFC series.

Best regards,
Rainer

[1] http://tools.ietf.org/html/draft-ietf-syslog-protocol-23
[2] http://www.monitorware.com/Common/en/articles/ihe-syslog.php

back to work…

You know this: the more you like something, the “faster” time elapses. So it turns out to be Thursday of my first week back at work from my summer vacation now ;) This time, I was really lazy and had extremely limited Internet connectivity while I was away. While a bit unusual for me (I was never disconnected for more than 2 days the past 10 years or so…), it turned out to be a good experience (well, some email via PDA flowed, though). As a side-note, it was good the see the rsyslog well alive while I was out of town! Many thanks to all contributors.

As you probably expect, there was a bunch of work waiting for me when I returned. I am still suffering a bit from it. However, I managed to do some work on rsyslog. So I finally managed to get rid of the hardcoded syslog message size limit. This, of course, caused a lot of code to be touched. I did a pre-release on the mailing list, but I do not have the feeling that many tried it. Well, now it is the official devel and we’ll see if we get into interesting parts of trouble.

The next thing on my agenda is the new documentation generation system. I got a lot of help from my friends at Red Hat Japan. Actually, I now need to fully understand the way docbook and the generation process at all works. I guess that will keep my occupied for a while. So please keep watching this blog, even though I may not have so many new posts for the time being.

rsyslog error reporting – how to do it well…

Rsyslog obviously gains momentum. Not only is it becoming the default syslogd on Debian, a very important distribution, I am also seeing an increasing amount of questions inside the rsyslog forums and mailing list. The later, I think, is a good indication that people begin to care about rsyslog and also begin to explore rsyslog’s enhanced features.

I am very happy with this development. However, it also shows a downside. Rsyslog, for obvious reasons, “offers” much more chance for misconfiguration than its feature-bare ancestor sysklogd. While I tried hard to make configuration simple and intuitive, I managed to succeed only in the simple cases. If you configure rsyslog for complex needs, I have provided ample ways to screw up ;) That situation will hopefully get better with the new scripting engine, but even that will not be able to totally resolve the issue. To help people getting over this phase, rsyslog offers a myriad of diagnostics. Whenever something is wrong, it logs at least one message telling you (and often a couple of them, helping to identify the culprit).

With some frustration, however, I begin to see that many people never see these diagnostics. Many people try a configuration and just notice that it does not work. They never look at rsyslog’s own detailed error message (and some even do not write them anywhere, so they actually have nothing to look at ;)). So it has become quite common that for questions raised on the forum, we go lengths through the process of obtaining (large) debug logs, just to see that the debug log contains the otherwise-ignored error message that explains it all.

A (too) simple approach would be to blame users: why, the heck, don’t they pay attention to their system setup. Beside being lame to blame users, it doesn’t help solve the issue. The root cause seems to be that people tend not to know where to look for help. One approach is to improve documentation. I’ve gone some length on that, but what comes into our way is that people (including me ;)) tend to ignore documentation until there is no way around it (at this point, almost always too late, the foolishness of that approach is proven ;)).

An additional complexity is that people need to have at least some working configuration, including permissions, to make rsyslog record its error messages. A while ago, I thought about an internal message buffer, for error messages and maybe others, that can be viewed via a web browsers (this implies a simple http server inside rsyslogd). So folks with problems could simply point their web browser at rsyslogd and see which diagnostics exists. This solves at least the problem of recording and finding the messages. It does not solve the issue of people not knowing to do that, but it improves the situation: it is easier to tell “just point your web browser to” then to instruct people on how to review log files or create and send debug logs.

Up until now, I have stayed back from this approach because of security concerns I have. Such a http server, especially if it would also enable to view some of the live log data and live state information, could be an ideal tool for attackers. Of course, I can disable it by default and limit its features. But that is counter-productive in regard to the simple troubleshooting case. So I stayed back from it.

Now, seeing that the support problem (and pain people experience) is becoming worse, I have taken another look at this approach. I now think there is a compromise: I can create a separate plugin, one that can be loaded via a single “$ModLoad”, but is not by default. So the system is secure by default but it is easy to instruct people to activate advanced diagnostics capability. It is obviously also easy to disable them, but what if people forget (and we tend to, don’t we?). That would leave the attack vector intact once it was enabled. I think I now have a cure: What if the plugin automatically disables itself after a given period of time. Looking at the use case, it would be sufficient for most cases if diagnostics are available for 10 minutes after rsyslogd restart, but not any longer. I think I will now take this route. While it leaves the attack vector open, it mitigates the risk to a time frame, which usually is very short compared to overall runtime. And, after all, this is just a second safety measure. In the first instance, people should disable diagnostics once they no longer need it. Enabled diagnostics probably warrant a (startup) log message in themselves, so someone who cares to get an error-free startup should not forget about that (and those who don’t care, don’t deserve any better but still have a safety belt…).

I think I will try to implement a testing plugin in the current devel branch. This was unplanned work, and obviously pushes away other work. In the long term, however, I think it is very important to help folks getting up and running easily, so changing the schedule to solve that need is justified. For my testing purposes, though, I will not start with a http server but with something much more simplistic – just to grasp the essence of this approach.