The typical logging problem as viewed from syslog

I run into different syslog use cases from time to time. So I thought it is a good idea to express what I think the typical logging problem is. As I consider it the typical problem, syslog (and WinSyslog and rsyslog in specific) address most needs very well. What they spare is the analysis and correlation part, but other members of the family (like our log analyzer) and third parties care well for that.

So the typical logging problem, as seen from the syslog perspective, is:

  1. there exists events that need to be logged
  2. a single “higher-level” event E may consist of a
    number of fine-grained lower level events e_i
  3. each of the e_i’s may be on different
    systems / proxies
  4. each e_i consists of a subset of properties
    p_j from a set of all possible common properties P
  5. in order to gain higher-level knowledge, the
    high-level event E must be reconstructed from
    e_i’s obtained from *various* sources
  6. a transport mechanism must exist to move event
    e_i records from one system to another, e.g., to
    a central correlator
  7. systems from many different suppliers may be involved,
    resulting in different syntax and semantic of
    the higher-level objects
  8. there is potentially a massive amount of events
  9. events potentially need to be stored for
    an extended period of time
  10. quick review of at least the current event data
    (today, past week) is often desired
  11. there exists lots of noise data
  12. the data needs to be fed into backend processes,
    like billing systems

on leap seconds and syslog

I was recently asked how syslog handles leap seconds. I thought it would be useful to reproduce my thoughts, initially expressed via private mail, here in the blog.

RFC5424 specifically forbids leap seconds, as during our discussions we found many cases where leap seconds caused grief. I also think the the TAI is considering aborting the use of leap seconds for this reason as well. To the best of my knowledge, GPS also does not use leap seconds. The ultimate reason to abandon UTC leap seconds in syslog was the we failed to identify an operating system that would expose leap seconds to a user process. So a syslogd or any other syslog sender would not even be able to see that one was introduced. From the syslog perspective, a leap second is just like any other second, but time flows “somewhat slower”. I guess we are in the same boat as many operating systems with this perspective.

In RFC5424 we didn’t explicitly state what time stamp should be written during a leap second – because we thought it could actually never happen (why? explained above!). But I would say that “Leap seconds MUST NOT be used” to me means that it should be expressed as the 59th second of said minute. But even if you bump the minute and use the 0 second, I cannot see how this should be problematic. On a single system, time should still evolve serially. For correlating events form multiple systems, the timestamp alone is insufficient in any case. You cannot closely enough synchronize the different real time clocks. So you need a different meachanism (like Lamport clocks) for this in any case.

rsyslog multithreading

From time to time, I receive questions on how many cores rsyslog can run on a highly parallel system. Rsyslog is massivley multi-threaded, but that does not necessarily mean that each configuration, and even each use case, can actually benefit from it.

The most important thing to gain a speedup from parallelism is the ability to break down the workload (this is called “partitioning”) and distribute it to a set of threads, which than can work in parallel on each part.

For the partitioning to work well, the workload, and configuration, must be “partionable”. Let me give a counter-example. If you have a single sender and a single action (yes, this sometimes is the case!), there can not be much parallelism. Such a config looks like this (using imtcp as an example here):

$TCPServerRun 10514
*.* /path/to/file

This can not gain much, because we have on thread for the TCP receiver, one thread for the filtering and one for the output. With the queue engine, we can increase the number of threads that will work on filters in parallel, but these have almost nothing to do in any case. We can not, however, walk in parallel into the output action, because a) the output plugin interface guarantees that only one thread hits a plugin at one time and b) it wouldn’t make much sense here in any case: what would it help if we had hit the output twice and then need top synchronize the file access? No much…

So the bottom line is that a configuration like the one above is highly sequential in nature and consequently there is almost no gain by running some of the tasks concurrently. So, out of the box, rsyslog gains speedup from parallel processing in more complex cases, with more complex rule and many of them.

We are working the provide excellent speedup even for sequential configurations. But this is a long and complex road. For example, in v5 we have now de-coupled message parsing from the receiver thread, resulting in somewhat improved speedup for sequential configs like the one above. Also, we have added batching support in v5, which reduces some overhead involved with multiple threads (and thus reduces the gain we could potentially have). And in late v4 builds we introduced the ability to do double-buffered block i/o for output files, which can considerably reduce i/o overhead for high end systems and also runs in pipeline mode, sequzing a bit more parallelism out of the sequential job.

So with the newer engines, we have been able to apply a basic processing pipeline that looks like

input -> parse & filter -> generate file data -> write

which can be done in parallel. Of course, the file write is action-specific, but I guess you get the idea. What you need to do, however, is configure all that. And even then, you can not expect a 4-time speedup on a quad core system. I’d say you can be happy if the speedup is around 2, depending on a lot of factors.

To get to higher speedups, the job must be made more parallel. One idea is to spread the input, e.g. run it on four ports, then create four rulesets with ruleset queues for each of the inputs. Ideally, to solve the file bottleneck, these should write into four different files. While I did not have the opportunity to test this out in an actual deployment, that should gain a much larger speedup. Because now we have four of this pipelines running in parall, on partitioned data where there is no need to synchronize between them.

Well, almost… The bad news is that the current code base (5.5.0 as of this writing) does unfortunately not yet provide the ability to run the input on more than one thread. So if you have 1000 tcp connections, all of these need to be processed by a single thread (even though they may use different ports, that doesn’t matter…). It is not as bad as it sounds, because the input now is *very* quick (remember the parsing is done concurrently in a different thread [pool!]). But still it causes some loss of parallel processing where not strictly needed. My thinking is that we should either do a “one thread per connection” server (not any longer such a big problem on 64bit machines) or (better but even more effort) do a thread pool for pulling data from the connections. Unfortunately, I do not have time to tackle that beast, but maybe someone is interested in sponsoring that work (that would be *really* useful)?

As you can see, full speedup by using multiple cores is perfectly doable, but going the max requires a lot of careful thinking. And, of course, I have to admit that the best features are present in the newest releases (somewhat naturally…). Obviously, there is some stability risk involved with them, but on the other hand I had some very good success reports from some high-end sites, at least on of them has v5 already deployed in large-scale production.

I could only touch the issue here, but I hope the information is useful. For further reading, I recommend both the doc on queues, as well as my explanation on how messages are processed in rsyslog. These documents are somewhat older and do not cover all details of pipeline processing (which simply did not exist at that time), but I think they will be very useful to read. And, yes, updating them is another thing on my too-long todo list…

A solution for invalid syslog message formats…

In syslog, we traditionally have a myriad of message formats, causing lots of trouble in real-world deployments. There are a number of industry efforts underway trying to find a common format. To me, it currently does not look like one of them has received the necessary momentum to become “the” dominating standard, so it looks like we need to live with various presentations of the same information for some more time.

The past two weeks, I have begun to make additions to rsyslog that hopefully will help solve this unfortunate situation. I know that I have no real cure to offer, but at least baby steps toward it. I have introduced so called message parsers, which can be utilized to convert malformed messages into rsyslog’s well-formed internal structure.

Why is it not a solution? Because what I really introduced was actually an interface, which permits to write different parsers for the myriad of devices. I have not provided a generic solution to do that, so the individual parsers need to be written. And secondly, I have not yet defined any more standard properties than those specified in the recent IETF syslog rfc series, most importantly RFC5424.

So why I hope this will lead to a long-term solution?
First of all, there are some hopes that the IETF effort will bring more standard items. Also, we could embed other specifications within the RFC5424 framework, so this could become the lingua franca of syslog message content over time. And secondly, I hope that rsyslog’s popularity will help in getting parsers at least for core RFC5424 information objects, which would be the basis for everything else. Now we have the capability to add custom parsers, and we have an interface that third parties can develop to (and do so with relative ease).

All in all, I think this development is a huge step into the right direction. The rest will be seen by history ;) To me, the probably most interesting question is if we will actually attract third party developers. If there are any, I’ll definitely will help get them going with the rsyslog API.

Why is there still PRI in a syslog message?

This is the first of a couple of blog posts I intend to do in response to Raffy’s post on syslog-protocol. I am very late, but better now than never. Raffy raised some good points. To some I agree, to some not and for some others it is probably interesting to see why things are as they are.

The bottom line is that this standard – as probably every standard – is a compromise of what could be agreed on by a larger group of people and corporate interests. Reading the IETF mailing list archives will educate much about this process, but I will dig out those interesting entry points into the mass of posts for you.

I originally thought I reply with a single blog post to Raffy. However, this tends to be undoable – every time I intend to start, something bigger and more important comes into my way. So I am now resorting to more granualar answers – hopefully this work.

Enough said, on the the meat. Raffy said:

  • Syslog message facility: Why still keeping this? The only reason that I see people using the facility is to filter messages. There are better ways to do that. Some of the pre-assigned groups are fairly arbitrary and not even really implemented in most OSs. UUCP subsystem? Who is still using that? I guess the reason for keeping it is backwards compatibility? If possible, I would really like this to be gone.
  • Priority calculation: The whole priority field is funky. The priority does not really have any meaning. The order does not imply importance. Why having this at all?

 

And I couldn’t agree more with this. In my personal view, keeping with the old-style facility is a large debt, but it was necessary to make the standard happen. Over time, I have to admit, I even tend to think it was a good idea to stick with this format, it actually eases transition.

Syslog-protocol has a long history. We thought several times we were done, and the first time this happened was in November, 2005. Everything was finalized and then there was a quite unfortunate (or fortunate, as you may say now ;)) IETF meeting. I couldn’t attend (too much effort to travel around the world for a 30-minute meeting…) and many other WG participants also could not.

It took us by surprise that the meeting agreed the standard was far from ready for publishing (read the meeting minutes). The objection raised a very long (and productive, I need to admit) WG maling list discussion. To really understand the spirit of what happened later, it would be useful to read mailing list archives starting with November, 14th.

However, this is lots of stuff, so let me pick out some posts that I find important. The most important fact is that backward compatibility became the WG charter’s top priority (one more post to prove the point). Among others, it was strongly suggested that both the PRI as well as the RFC 3164 timestamp be preserved. Thankfully, I was able to proof that there was no common understanding on the date part in different syslog server (actually, the research showed that nothing but PRI is common among syslogds…). So we went down and decided that PRI must be kept as is – to favor compatibility.

As I said, I did not like the decision at that time and I still do not like the very limited number of facilities that it provides to us (actually, I think facility is mostly useless). However, I have accepted that there is wisdom in trying to remain compatible with existing receivers – we will stick with them for a long time.

So I have to admit that I think it was a good decision to demand PRI begin compatible. With structured data and the other header fields, we do have ways of specifying different “facilities”, that is originating processes. Take this approach: look at facility as a down-level filtering capability. If you have a new syslogd (or write one!) make sure you can filter on all the other rich properties and not just facility.

In essence, I think this is the story why, in 2009, we still have the old-style PRI inside syslog messages…

RFC 3195 back in the game?

RFC 3195 was thought to be the solution for reliable syslog. It is based on TCP and the BEEP protocol. It was written in November 2001 but never gained much attention. The premier reason everyone tells you is complexity of BEEP (and lack of toolkits). A few years ago, I wrote my own logging-related RFC 3195 toolkit, liblogging. It, too, did not get much momentum.

Anyhow, I used a modified version of liblogging to offer RFC 3195 support under Windows as part of the MonitorWare product line. Again, we never heard much of this feature. In rsylog, I created an input plugin for RFC 3195. At that time, however, I already had the feeling 3195 was a failure. So I was hesitant to implement an output plugin, too. And, as expected: nobody every cared, except for some folks building packages. And these not for practical needs but for the sake of getting packages for everything…

So up until now, I would conclude that 3195 is indeed a failure. However, there seems to be some increasing interest. At least, I got a couple of questions the past weeks on RFC3195 and Adiscon, my company, just got a not-so-small order of its EventReporter product which explicitly has RFC 3195 put into the requirements. Is this a sign of increasing interest? Or is just somebody filling check mark items? This remains to be seen.

So, there seems to be a slight chance that RFC 3195 is getting revived. Maybe it took just some year so that the idea could ripen. In any case, I am prepared for RFC 3195 demand. Maybe finally doing all that work begins to pay off…

new syslog RFC will not advance…

I thought that after 5 years of IETF syslog WG work, I’d already be through all potential sources of frustration. Wrong! Last Friday, I was notified that the otherwise ready-to-be-published RFC could not be published because the IETF requires a copyright assignment to the trust, in which the author must grant all rights. Of course, he must also do so on behalf of every contributor.

Quite frankly speaking, this is more or less impossible for this draft. Among the many reasons is that it would be extremely time-consuming to verify each and every (small) contribution from the past 5 years. Also, I copied (perfectly legal than) text from other RFCs, who do not know who contributed. There are more reasons, but I do not like to elaborate on all of them again.

The bottom line is that the syslog RFC series is again stalled. The IETF has at least identified there is a problem and a work-around is being discussed. A decision will be made on February, 15th. Let’s hope for the best, otherwise all the work would probably be lost.

This IMHO is also a good example on what a too-far streched system of intellectual property rights can do. This, and also software patents, are a threat to our freedom. The stop progress and unnecessarily limit out capabilities. Hopefully more and more people realize that and will tell their governments to stop this abusive system.

Thailand is going syslog…

I found an interesting read in “The Nation”, one of Thailand’s largest business dailies. They talk about the economic crisis and the way Thailand plans to reduce negative effects. There is a 5-point initiative in place. Of interest for us the the fifth and final point:

Finally, the association will focus on security, which promises to be this year’s main technology trend. It will urge software companies to become more familiar with Syslog, which is a standard for forwarding log messages in an IP network, but is also typically used for computer system management and security auditing.

So, as it looks, Thailand is betting on security. This is obviously a good movement. Interestingly, they seem to have identified logging, and syslog in specific, to be a major building block in this endeavor. That’s a bit surprising, given the typical weaknesses of syslog. But they’ve probably identified the broad potential this protocol has. Maybe I should look a bit more towards Asia with rsyslog and phpLogCon as well as the Windows product line.

syslog appliance website online

I have now set up a first basic web site for SyslogAppliance. It is not great yet, but it provides a stable reference point for any work that comes up. So people can hopefully begin to use this site as a pointer for useful resources.

As a side-note, you may notice I am using a .de (German) domain. Thanks to the spammers, com, org and net domains are already used by spamming sites. And I thought it does not matter if we use a de domain. After all, we live in a time where domains from the Cocos Islands (cc) or Tuvalu (tv) are being abused for generic purposes, so why not use .de for a generic site, too?

Oh, and one interesting find: at least one person actually downloaded and tried the version of SyslogAppliance I uploaded yesterday. How do I know? I had forgotten to include the phpLogCon user in the README ;) [of course, this is fixed now ;)]

rsyslog performance

Thanks to David Lang, I have been able to gather some performance data on rsyslog. More importantly, I have been able to improve rsyslog’s performance dramatically while working with David. He does not only dispense good advise, he has also a great test environment which I lack. If you would like to see how things evolve, be sure to follow this (lengthy ;) thread: http://kb.monitorware.com/rsyslog-performance-t8691.html.

But you are probably interested in actual numbers.
The current v3-stable (3.18.x) manages to process around 22.000 messages per second (mps) with DNS name resolution turned on and about double that value without. That’s not bad, but obviously there is room for improvement.

Thanks to our combined effort, we have reached a state where we can process more than 100,000 mps and there is an experimental version (applying some lock-free algorithms) that goes well beyond 200,000 mps. I am not yet sure if we will pursue the lock-free algorithm. There are ample of additional ideas available and I am quite positive we can push the limit even further.

All numbers were tested with a minimal configuration (one udp input, one file output) on a capable multi-core machine. The numbers above are for sustained traffic rates. More messages can be accepted (and buffered) during bursts.