rsyslog now available on Solaris

Rsyslog has become the de-facto standard on modern Linux operating systems. It’s high-performance log processing, database integration, modularity and support for multiple logging protocols make it the sysadmin’s logging daemon of choice. The project was started in 2004 and has since then evolved rapidly.

Starting with today, rsyslog is not only available on Linux and BSD, but also on Sun Solaris. Both Intel and Sparc machines are fully supported under Solaris. Depending on operator need, rsyslog can replace stock Solaris syslogd or be used in conjunction with it. The later case provides enhanced rsyslog functionality without the need to change the system infrastructure.

Solaris is now a tier-one target platform. That means that all testing for major releases will be carried out on Solaris as well as on other platforms. The Solaris port was done very careful taking into account Sun’s somewhat specific syslogd handling via door files and preserving the full power of rsyslog. So it not only compiles and runs on Solaris but rsyslog is a good citizen in the Solaris environment.

As of usual rsyslog project policies, the project does not make installation packages other than the source distribution available. However, we work closely together with the Solaris community be able to provide them. We expect additional announcements soon.

The versions with initial solid Solaris support are 4.7.2 and 5.5.4. Rsyslog’s Solaris port was made possible by a generous contribution of hardware and some development funding by a sponsor which preferred to remain anonymous. We from the rsyslog project would like to express our sincere appreciation. Contributions of any kind are always very welcome.

syslog data modeling capabilities

As part of the IETF discussions on a common logging format for sip, I explained some sylsog concepts to the sip-clf working group.

Traditionally, syslog messages contain free-form text, only – aimed at human observers. Of course, today most of the logging information is automatically being processed and the free-form text creates ample problems in that regard.

The recent syslog RFC series has gone great length to improve the situation. Most importantly, it introduced a concept called “Structured Data”, which permits to express information in a well-structured way. Actually, it provides a dual layer approach, with a corase designator at the upper layer and name/value pairs at the lower layer.

However, the syslog RFC do NOT provide any data/information modeling capabilities that come with these structured data elements. Their syntax and semantics is to be defined in separate RFCs. So far, only a few examples exist. One of them is the base RFC5424, which describes some common properties that can be contained in any syslog message. Other than that, RFC5674, which describes a mapping to the Alarm MIB and ITU perceived severities and RFC5675, which describes a mapping to SNMP traps. All of them are rather small. The IHE community, to the best of my knowledge, is currently considering using syslog structured data as an information container, but has not yet reached any conclusion.

Clearly, it would be of advantage to have more advanced data modeling capabilities inside the syslog base RFCs, at least some basic syntax definitions. So why is that not present?

One needs to remember that the syslog standardization effort was a very hard one. There were many different views, “thanks” to the broad variety of legacy syslog, and it was extremely hard to reach consensus (thus it took some years to complete the work…). Next, one needs to remember that there is such an immense variety in message content and objects, that it is a much larger effort to try define some generic syntaxes and semantics (I don’t say it can not be done, but it is far from being easy). In order to get the basics done, the syslog WG deciced to not dig down into these dirty details but rather lay out the foundation so that we can build on it in the future.

I still think this is a good compromise. It would be good if we could complement this foundation with some already existing technology. SNMP MIB encoding is not the right way to go, because it follows a different paradigm (syslog is still meant to be primarily clear text). One interesting alternative which I saw, and now evaluate, is the ipfix data modeling approach. Ideally, we could reuse it inside structured data, saving us the work to define some syslog-specific model of doing so.

The most important task, however, is to think about, and specify, some common “information building blocks”. With these, I mean standard properties, like source and destination ID, mail message id, bytes sent and received and so on. These, together with some standard syntaxes, can greatly relieve problems we face while consolidating and analyzing logs. Obviously, this is an area that I will be looking into in the near future as well.

It may be worth noting that I wrote a paper about syslog parsing back in 2004. It was, and has remained, work in progress. However, Adiscon did implement the concept in MonitorWare Console, which unfortunately never got wider exposure. Thinking about it, that work would benefit greatly from the availability of standardized syslog data models.

A solution for invalid syslog message formats…

In syslog, we traditionally have a myriad of message formats, causing lots of trouble in real-world deployments. There are a number of industry efforts underway trying to find a common format. To me, it currently does not look like one of them has received the necessary momentum to become “the” dominating standard, so it looks like we need to live with various presentations of the same information for some more time.

The past two weeks, I have begun to make additions to rsyslog that hopefully will help solve this unfortunate situation. I know that I have no real cure to offer, but at least baby steps toward it. I have introduced so called message parsers, which can be utilized to convert malformed messages into rsyslog’s well-formed internal structure.

Why is it not a solution? Because what I really introduced was actually an interface, which permits to write different parsers for the myriad of devices. I have not provided a generic solution to do that, so the individual parsers need to be written. And secondly, I have not yet defined any more standard properties than those specified in the recent IETF syslog rfc series, most importantly RFC5424.

So why I hope this will lead to a long-term solution?
First of all, there are some hopes that the IETF effort will bring more standard items. Also, we could embed other specifications within the RFC5424 framework, so this could become the lingua franca of syslog message content over time. And secondly, I hope that rsyslog’s popularity will help in getting parsers at least for core RFC5424 information objects, which would be the basis for everything else. Now we have the capability to add custom parsers, and we have an interface that third parties can develop to (and do so with relative ease).

All in all, I think this development is a huge step into the right direction. The rest will be seen by history ;) To me, the probably most interesting question is if we will actually attract third party developers. If there are any, I’ll definitely will help get them going with the rsyslog API.

Will Microsoft remove the Windows Software RAID?

These days, hardware rates are quite inexpensive. So everybody is moving towards them. However, all mainstream operating systems still support software RAIDs, maybe even for a good reason: an os-controlled software raid may be a bit better to optimize under some circumstances. Anyhow. Microsoft seems to move away from that feature set:

As you probably know, Adiscon provides premier Windows event log processing solutions. Some of our customers use the products for example to monitor if their RAIDs break. And some of them use software RAIDs. So we wrote a nice article on how to monitor RAID health using the Windows Event Log.

Since the days of Windows NT 3.1 (or was it 3.5), the Windows logged an error message if the RAID failed. Actually, I’d consider this a necessary functionality for any working RAID solution. Why? Well, if the RAID solution works, you will not notice that a disk has died. So if nobody tells you, you’ll continue to use the system as usual, not suspecting anything bad. So guess what – at some time the next disk fails and then (assuming the usual setup) you’ll be “notified” by the disk system, with those nice unnercoverable i/o errors. So without any health alerts, a RAID system is virtually useless.

We learned, that Windows Server 2008’s RAID system does no longer issue these alerts! (aka “is useless” ;)). So a long while ago, we reported this to Microsoft. The bug went through several stages of escalation. A few minutes ago, my co-worker got a call from the frontline Microsoft tech. He told him that, regrettably, Microsoft won’t fix this issue. According to his words, Micorosoft has confirmed this to be a bug, and the group responsible for ftdisk has confirmed that it should be fixed but someone more powerful up in the hierarchy has opted not to do that. Boom. The tech tried to persuade us to switch to a hardware RAID, but actually that was not the point of the support call ;)

What does that mean? To me, it looks like Microsoft is actually moving away from providing software RAID. How other can one explain that there is no interest in providing any error message at all if something goes wrong with the RAID. Given the wide availability of hardware RAIDs (which, btw, provide proper alerting), this step does not look illogical. But do they really want to leave Linux with being the only widely deployed mainstream operating system that provides software RAID? Or do they intend to keep it on the feature sheet, but provide a dysfunctional solution like in Windows Server 2008?

Let’s stay tuned and listen what the future brings…

On the reliable plain tcp syslog issue … again

Today, I thought hard about the reliable plain TCP syslog issue. Remeber? I have ranted numerous times on why “plain tcp syslog is not reliable” (this link points to the initial entry), and I have shown that by design it is not possible to build a 100% reliable logging system without application level acks.

However, it hit me during my morning shower (when else?) that we can at least reduce the issue we have with the plain TCP syslog protocol. At the core of the issue is the local TCP stack’s send buffer. It enhances performance but also causes our app to not know exactly what has been transmitted and what not. The larger the send buffer, the larger our “window of uncertainty” (WoU) about which messages made it to the remote end. So if we are prepared to sacrifice some performance, we can shrink this WoU. And we can simply do that by shrinking the send buffer. It’s so simple that I wonder a shower was required…

In any case, I’ll follow that route in rsyslog in the next days. But please don’t get me wrong: plain TCP syslog will not be reliable if the idea works. It will just be less unreliable – but much less ;)

virtual appliance for disaster recovery?

I was asked what role virtual appliances speak in disaster recovery planning. I though I share my view here. Speaking for ourselfs as a smaller company: we are moving towards virtual environments not only in order to consolidate systems, but also because it is much easier to move over functionality from a failed system to another. Some of the functions (like mail gateway, firewall etc) do not even require state data, so they can simply be restored by using a generic template virtual machine.

Instantiating this is much quicker then building a machine with scripts from scratch, not to mention that we do not need to have the hardware in stock. In fact, we think about moving such functionality even to data center servers and thus be able to quickly switch between them if there is need to.

My syslog appliance could play a similar role in disaster recovery. While it probably is not appropriate to lose data (depending on use case), it may make sense to set up a new temporary appliance, just to continue gather data and provide analysis while the rest of the system is restored. Instant log analysis is probably a key thing you would like to have in your early recovery stages.

GPLv3 and rsyslog

Did you know? GPLv3 is out. And I am seriously considering it for my rsyslog project. Why? I do not like Tivo-ization nor do I like software patents. So, isn’t then moving to a license unfriendly to those a good idea? I think it is. But of course, there are a number of subtleties that I need to check.

I guess version 2.0.0, due soon, will be released under GPLv3.

Why are there so few messages from sysklogd itself?

Have you ever wondered why your logs do not contain anything from the syslog subsystem itself, except for maybe a message or two? Tina Bird has started an interesting new discussion on the loganalysis mailing list.

Of course, I couldn’t stand it and have added my 2cts. I’d like to reproduce it here in the blog, too:

> I have received a number of responses along these lines, obtained by
> grepping the source code or by running strings on the binary.
> These are far
> better than nothing, and I’m grateful for the help, but they miss an
> important piece of the picture. Especially in a piece of code
> as old and,
> uh, crufty as syslogd, there’s a high likelihood that many of
> the errors
> find themselves at the far ends of code paths that rarely (if
> ever) get
> executed, and therefore those errors never find themselves in
> the “outside”
> world, providing assistance (or confusion) to system administrators
> everywhere.

OK, I’ve once again done a real review of the sysklogd 1.14.1 source. I wanted to make sure I really tell the truth. The plain truth is that it is nearly impossible that anything goes wrong after syslogd is started. So you’ll observe a number of “config file invalid” messages, but only (hopefully;)) during initial setup. Once things run smoothly, you will see error message only when things go really wrong, e.g. when the hard disk dies. But then, in practice, will that ever occur? If the answer is yes, then you need to ask “will it be seen”? Of those systems where a hard disk failure is catastrophic, all of the logs are probably on that failed hard disk. Yes, exactly that disk our error message will be … ahem would be … written to ;) So you end up with just initialization and termination messages.

Is that the case because syslogd is such a perfect piece of software. Not really. The reason is that the stock implementation simply can not have any real problems once it runs: selector lines were either OK (and operating) or invalid (and disabled). And how about the network? Surely received packets are a trouble source. Formatting errors of all kind…

Let’s have a look at (informational RFC 3164):

##
4. Packet Format and Contents

The payload of any IP packet that has a UDP destination port of 514
MUST be treated as a syslog message.
##

Sweet – anything that is destined to 514 is a syslog message. No matter what the content is. Really? Am I kidding? Let’s read on:

##
Example 2

Use the BFG!

While this is a valid message, it has extraordinarily little useful
information.
##

Yeah… This is a valid message. This also: “HaHaHa”. So how will a parser need to complain when it processes the message. It doesn’t – and that’s why you won’t see many messages from sysklogd itself.

HOWEVER, things are improving. In rsyslogd, there are a lot more things that can go wrong. For example, IETF is standardizing the frame format if TLS is used. This provides a number of opportunities for emitting error messages. TCP itself gives ground to another set of messages. On the output side the same: rsyslog can do dynamic file names. That means files are created depending on incoming messages. Of course, things can go wrong here, providing another set of error messages.

I am talking about rsyslog, because I maintain this project. I think any other modern-day syslogd has a similar set of error messages. And these are possibly seen in practice. But now it is much more depending on how valid all parts of the system, including senders, work. With the majority of syslog-enabled applications still following the “I don’t need to obey any format” paradigm, the typical cause for error messages is not-existent for syslog servers.

I hope that clarifies. And there is even hope: syslogd’s will spit out more errors in the future ;) [and, yes, I have at least created a todo item to emit meaningful error identifiers together with them…]

Rainer