on leap seconds and syslog

I was recently asked how syslog handles leap seconds. I thought it would be useful to reproduce my thoughts, initially expressed via private mail, here in the blog.

RFC5424 specifically forbids leap seconds, as during our discussions we found many cases where leap seconds caused grief. I also think the the TAI is considering aborting the use of leap seconds for this reason as well. To the best of my knowledge, GPS also does not use leap seconds. The ultimate reason to abandon UTC leap seconds in syslog was the we failed to identify an operating system that would expose leap seconds to a user process. So a syslogd or any other syslog sender would not even be able to see that one was introduced. From the syslog perspective, a leap second is just like any other second, but time flows “somewhat slower”. I guess we are in the same boat as many operating systems with this perspective.

In RFC5424 we didn’t explicitly state what time stamp should be written during a leap second – because we thought it could actually never happen (why? explained above!). But I would say that “Leap seconds MUST NOT be used” to me means that it should be expressed as the 59th second of said minute. But even if you bump the minute and use the 0 second, I cannot see how this should be problematic. On a single system, time should still evolve serially. For correlating events form multiple systems, the timestamp alone is insufficient in any case. You cannot closely enough synchronize the different real time clocks. So you need a different meachanism (like Lamport clocks) for this in any case.

rsyslog multithreading

From time to time, I receive questions on how many cores rsyslog can run on a highly parallel system. Rsyslog is massivley multi-threaded, but that does not necessarily mean that each configuration, and even each use case, can actually benefit from it.

The most important thing to gain a speedup from parallelism is the ability to break down the workload (this is called “partitioning”) and distribute it to a set of threads, which than can work in parallel on each part.

For the partitioning to work well, the workload, and configuration, must be “partionable”. Let me give a counter-example. If you have a single sender and a single action (yes, this sometimes is the case!), there can not be much parallelism. Such a config looks like this (using imtcp as an example here):

$TCPServerRun 10514
*.* /path/to/file

This can not gain much, because we have on thread for the TCP receiver, one thread for the filtering and one for the output. With the queue engine, we can increase the number of threads that will work on filters in parallel, but these have almost nothing to do in any case. We can not, however, walk in parallel into the output action, because a) the output plugin interface guarantees that only one thread hits a plugin at one time and b) it wouldn’t make much sense here in any case: what would it help if we had hit the output twice and then need top synchronize the file access? No much…

So the bottom line is that a configuration like the one above is highly sequential in nature and consequently there is almost no gain by running some of the tasks concurrently. So, out of the box, rsyslog gains speedup from parallel processing in more complex cases, with more complex rule and many of them.

We are working the provide excellent speedup even for sequential configurations. But this is a long and complex road. For example, in v5 we have now de-coupled message parsing from the receiver thread, resulting in somewhat improved speedup for sequential configs like the one above. Also, we have added batching support in v5, which reduces some overhead involved with multiple threads (and thus reduces the gain we could potentially have). And in late v4 builds we introduced the ability to do double-buffered block i/o for output files, which can considerably reduce i/o overhead for high end systems and also runs in pipeline mode, sequzing a bit more parallelism out of the sequential job.

So with the newer engines, we have been able to apply a basic processing pipeline that looks like

input -> parse & filter -> generate file data -> write

which can be done in parallel. Of course, the file write is action-specific, but I guess you get the idea. What you need to do, however, is configure all that. And even then, you can not expect a 4-time speedup on a quad core system. I’d say you can be happy if the speedup is around 2, depending on a lot of factors.

To get to higher speedups, the job must be made more parallel. One idea is to spread the input, e.g. run it on four ports, then create four rulesets with ruleset queues for each of the inputs. Ideally, to solve the file bottleneck, these should write into four different files. While I did not have the opportunity to test this out in an actual deployment, that should gain a much larger speedup. Because now we have four of this pipelines running in parall, on partitioned data where there is no need to synchronize between them.

Well, almost… The bad news is that the current code base (5.5.0 as of this writing) does unfortunately not yet provide the ability to run the input on more than one thread. So if you have 1000 tcp connections, all of these need to be processed by a single thread (even though they may use different ports, that doesn’t matter…). It is not as bad as it sounds, because the input now is *very* quick (remember the parsing is done concurrently in a different thread [pool!]). But still it causes some loss of parallel processing where not strictly needed. My thinking is that we should either do a “one thread per connection” server (not any longer such a big problem on 64bit machines) or (better but even more effort) do a thread pool for pulling data from the connections. Unfortunately, I do not have time to tackle that beast, but maybe someone is interested in sponsoring that work (that would be *really* useful)?

As you can see, full speedup by using multiple cores is perfectly doable, but going the max requires a lot of careful thinking. And, of course, I have to admit that the best features are present in the newest releases (somewhat naturally…). Obviously, there is some stability risk involved with them, but on the other hand I had some very good success reports from some high-end sites, at least on of them has v5 already deployed in large-scale production.

I could only touch the issue here, but I hope the information is useful. For further reading, I recommend both the doc on queues, as well as my explanation on how messages are processed in rsyslog. These documents are somewhat older and do not cover all details of pipeline processing (which simply did not exist at that time), but I think they will be very useful to read. And, yes, updating them is another thing on my too-long todo list…

A solution for invalid syslog message formats…

In syslog, we traditionally have a myriad of message formats, causing lots of trouble in real-world deployments. There are a number of industry efforts underway trying to find a common format. To me, it currently does not look like one of them has received the necessary momentum to become “the” dominating standard, so it looks like we need to live with various presentations of the same information for some more time.

The past two weeks, I have begun to make additions to rsyslog that hopefully will help solve this unfortunate situation. I know that I have no real cure to offer, but at least baby steps toward it. I have introduced so called message parsers, which can be utilized to convert malformed messages into rsyslog’s well-formed internal structure.

Why is it not a solution? Because what I really introduced was actually an interface, which permits to write different parsers for the myriad of devices. I have not provided a generic solution to do that, so the individual parsers need to be written. And secondly, I have not yet defined any more standard properties than those specified in the recent IETF syslog rfc series, most importantly RFC5424.

So why I hope this will lead to a long-term solution?
First of all, there are some hopes that the IETF effort will bring more standard items. Also, we could embed other specifications within the RFC5424 framework, so this could become the lingua franca of syslog message content over time. And secondly, I hope that rsyslog’s popularity will help in getting parsers at least for core RFC5424 information objects, which would be the basis for everything else. Now we have the capability to add custom parsers, and we have an interface that third parties can develop to (and do so with relative ease).

All in all, I think this development is a huge step into the right direction. The rest will be seen by history ;) To me, the probably most interesting question is if we will actually attract third party developers. If there are any, I’ll definitely will help get them going with the rsyslog API.

Why is there still PRI in a syslog message?

This is the first of a couple of blog posts I intend to do in response to Raffy’s post on syslog-protocol. I am very late, but better now than never. Raffy raised some good points. To some I agree, to some not and for some others it is probably interesting to see why things are as they are.

The bottom line is that this standard – as probably every standard – is a compromise of what could be agreed on by a larger group of people and corporate interests. Reading the IETF mailing list archives will educate much about this process, but I will dig out those interesting entry points into the mass of posts for you.

I originally thought I reply with a single blog post to Raffy. However, this tends to be undoable – every time I intend to start, something bigger and more important comes into my way. So I am now resorting to more granualar answers – hopefully this work.

Enough said, on the the meat. Raffy said:

  • Syslog message facility: Why still keeping this? The only reason that I see people using the facility is to filter messages. There are better ways to do that. Some of the pre-assigned groups are fairly arbitrary and not even really implemented in most OSs. UUCP subsystem? Who is still using that? I guess the reason for keeping it is backwards compatibility? If possible, I would really like this to be gone.
  • Priority calculation: The whole priority field is funky. The priority does not really have any meaning. The order does not imply importance. Why having this at all?

 

And I couldn’t agree more with this. In my personal view, keeping with the old-style facility is a large debt, but it was necessary to make the standard happen. Over time, I have to admit, I even tend to think it was a good idea to stick with this format, it actually eases transition.

Syslog-protocol has a long history. We thought several times we were done, and the first time this happened was in November, 2005. Everything was finalized and then there was a quite unfortunate (or fortunate, as you may say now ;)) IETF meeting. I couldn’t attend (too much effort to travel around the world for a 30-minute meeting…) and many other WG participants also could not.

It took us by surprise that the meeting agreed the standard was far from ready for publishing (read the meeting minutes). The objection raised a very long (and productive, I need to admit) WG maling list discussion. To really understand the spirit of what happened later, it would be useful to read mailing list archives starting with November, 14th.

However, this is lots of stuff, so let me pick out some posts that I find important. The most important fact is that backward compatibility became the WG charter’s top priority (one more post to prove the point). Among others, it was strongly suggested that both the PRI as well as the RFC 3164 timestamp be preserved. Thankfully, I was able to proof that there was no common understanding on the date part in different syslog server (actually, the research showed that nothing but PRI is common among syslogds…). So we went down and decided that PRI must be kept as is – to favor compatibility.

As I said, I did not like the decision at that time and I still do not like the very limited number of facilities that it provides to us (actually, I think facility is mostly useless). However, I have accepted that there is wisdom in trying to remain compatible with existing receivers – we will stick with them for a long time.

So I have to admit that I think it was a good decision to demand PRI begin compatible. With structured data and the other header fields, we do have ways of specifying different “facilities”, that is originating processes. Take this approach: look at facility as a down-level filtering capability. If you have a new syslogd (or write one!) make sure you can filter on all the other rich properties and not just facility.

In essence, I think this is the story why, in 2009, we still have the old-style PRI inside syslog messages…

RFC 3195 back in the game?

RFC 3195 was thought to be the solution for reliable syslog. It is based on TCP and the BEEP protocol. It was written in November 2001 but never gained much attention. The premier reason everyone tells you is complexity of BEEP (and lack of toolkits). A few years ago, I wrote my own logging-related RFC 3195 toolkit, liblogging. It, too, did not get much momentum.

Anyhow, I used a modified version of liblogging to offer RFC 3195 support under Windows as part of the MonitorWare product line. Again, we never heard much of this feature. In rsylog, I created an input plugin for RFC 3195. At that time, however, I already had the feeling 3195 was a failure. So I was hesitant to implement an output plugin, too. And, as expected: nobody every cared, except for some folks building packages. And these not for practical needs but for the sake of getting packages for everything…

So up until now, I would conclude that 3195 is indeed a failure. However, there seems to be some increasing interest. At least, I got a couple of questions the past weeks on RFC3195 and Adiscon, my company, just got a not-so-small order of its EventReporter product which explicitly has RFC 3195 put into the requirements. Is this a sign of increasing interest? Or is just somebody filling check mark items? This remains to be seen.

So, there seems to be a slight chance that RFC 3195 is getting revived. Maybe it took just some year so that the idea could ripen. In any case, I am prepared for RFC 3195 demand. Maybe finally doing all that work begins to pay off…

new syslog RFC will not advance…

I thought that after 5 years of IETF syslog WG work, I’d already be through all potential sources of frustration. Wrong! Last Friday, I was notified that the otherwise ready-to-be-published RFC could not be published because the IETF requires a copyright assignment to the trust, in which the author must grant all rights. Of course, he must also do so on behalf of every contributor.

Quite frankly speaking, this is more or less impossible for this draft. Among the many reasons is that it would be extremely time-consuming to verify each and every (small) contribution from the past 5 years. Also, I copied (perfectly legal than) text from other RFCs, who do not know who contributed. There are more reasons, but I do not like to elaborate on all of them again.

The bottom line is that the syslog RFC series is again stalled. The IETF has at least identified there is a problem and a work-around is being discussed. A decision will be made on February, 15th. Let’s hope for the best, otherwise all the work would probably be lost.

This IMHO is also a good example on what a too-far streched system of intellectual property rights can do. This, and also software patents, are a threat to our freedom. The stop progress and unnecessarily limit out capabilities. Hopefully more and more people realize that and will tell their governments to stop this abusive system.

Thailand is going syslog…

I found an interesting read in “The Nation”, one of Thailand’s largest business dailies. They talk about the economic crisis and the way Thailand plans to reduce negative effects. There is a 5-point initiative in place. Of interest for us the the fifth and final point:

Finally, the association will focus on security, which promises to be this year’s main technology trend. It will urge software companies to become more familiar with Syslog, which is a standard for forwarding log messages in an IP network, but is also typically used for computer system management and security auditing.

So, as it looks, Thailand is betting on security. This is obviously a good movement. Interestingly, they seem to have identified logging, and syslog in specific, to be a major building block in this endeavor. That’s a bit surprising, given the typical weaknesses of syslog. But they’ve probably identified the broad potential this protocol has. Maybe I should look a bit more towards Asia with rsyslog and phpLogCon as well as the Windows product line.

syslog appliance website online

I have now set up a first basic web site for SyslogAppliance. It is not great yet, but it provides a stable reference point for any work that comes up. So people can hopefully begin to use this site as a pointer for useful resources.

As a side-note, you may notice I am using a .de (German) domain. Thanks to the spammers, com, org and net domains are already used by spamming sites. And I thought it does not matter if we use a de domain. After all, we live in a time where domains from the Cocos Islands (cc) or Tuvalu (tv) are being abused for generic purposes, so why not use .de for a generic site, too?

Oh, and one interesting find: at least one person actually downloaded and tried the version of SyslogAppliance I uploaded yesterday. How do I know? I had forgotten to include the phpLogCon user in the README ;) [of course, this is fixed now ;)]

rsyslog performance

Thanks to David Lang, I have been able to gather some performance data on rsyslog. More importantly, I have been able to improve rsyslog’s performance dramatically while working with David. He does not only dispense good advise, he has also a great test environment which I lack. If you would like to see how things evolve, be sure to follow this (lengthy ;) thread: http://kb.monitorware.com/rsyslog-performance-t8691.html.

But you are probably interested in actual numbers.
The current v3-stable (3.18.x) manages to process around 22.000 messages per second (mps) with DNS name resolution turned on and about double that value without. That’s not bad, but obviously there is room for improvement.

Thanks to our combined effort, we have reached a state where we can process more than 100,000 mps and there is an experimental version (applying some lock-free algorithms) that goes well beyond 200,000 mps. I am not yet sure if we will pursue the lock-free algorithm. There are ample of additional ideas available and I am quite positive we can push the limit even further.

All numbers were tested with a minimal configuration (one udp input, one file output) on a capable multi-core machine. The numbers above are for sustained traffic rates. More messages can be accepted (and buffered) during bursts.

syslog fragmentation

Hi all, I was asked about the current (and past) state of syslog splitting a single syslog message into multiple smaller messages. The question came up because of a broken syslogd implementation which permits to receive 256 bytes (octets) at most. I thought I re-post my reply to the blog as it may be of interest for you, too. Here we go:


Hi all,

I have now reviewed all of the discussion.

Let me start with the broken receiver. With 255 octets, any (generic)
fragmentation method would need to be ultra-compact which of course is
doable but not (with reasonable overhead) in the context we set up with
the version of syslog-protocol that will turn into a normative RFC.
Note, also, that RFC3164 will be superseded by that RFC once it is out.

As has been described here, fragmentation can either be done at the
protocol layer or at the application layer. In the later case, the
application needs to consider a sensible maximum and needs to emit
message sequences which are somewhat atomic. sendmail seems to do that,
and I know of some other examples, too. Some database servers seem to do
verbose logging in a similar way, in that they log parts of the
statement within different log messages. However, it is often quite
complicated to re-unite those application logs to the original message
(much of the complexity of log analysis stems from that).

A protocol based approach solves these issues. But as it has been
rejected by the syslog-sec WG it is not considered useful by the IETF
syslog community. We may want to try give it another shot, but that
should be done inside the framework layed out in the new IETF series and
as such it would not be a good solution for a broken receiver. Actually,
the new RFC series requires a minimum maximum length of 480 octets
(stemming back to IPv4 UDP available payload size). The recommended
minimum maximum length is 2K and more is permitted if sender and
receiver support it. There is no upper limit per se, but a receiver may
either truncate the message or even discard it as whole. If truncation
happens, it must truncate at the end and without paying any attention to
syntax and semantics of the message. This is specified in [1]’s section
6.1 and it was the result of very elaborate discussions. Most
importantly, the syntax- and semantic-agnostic truncation was a
requirement out of this discussion.

As Tina mentioned, my company Adiscon and me personally are doing
Windows event log to syslog conversion for quite a while. Windows event
messages can be large and keep growing larger. We are converting them to
syslog for over 10 years now and at the time we started the only common
ground was the 1K limit already mentioned. Note that at that time some
implementations could experience serious malfunction if messages over
this size arrived (I remember the Solaris syslogd immediately
segfaulting as one sample of more). We thought about how to best address
the issue. We were tempted to do an app-level split, much as sendmail
does, but refused to do that for two reasons:

1) the messages to be logged did not originate from ourselves (other
than in the sendmail case). This implies that we do not exactly know
what makes sense to put together. While there is a potentially large set
where this can be properly concluded from context, there is also a set
where this is not the case. The later would have required a more
protocol-like generic approach and thus specialised parsers on the end
systems – something we did not really like.

2) this is somewhat similar to the parser problem. In general, log
analysis is even harder if a single logical log entry is distributed
over several physical records. Especially if you take into account that
the order of appearance does not necessarily (in practice almost never)
reflect the order of creation. So processing such a log requires a
consolidation phase. It is especially hard for a human reviewer do this
while reviewing logs and thus considered a big disadvantage.

So we looked at what was available at that time. While the 1K size limit
was universally accepted, most syslog receivers either supported larger
sizes by default, could configured to do so or being recompiled to
handle it (sysklogd, the then-omnipresent syslogd on Linux is a premier
example for the later – #define MAXLINE 1024 needed to be changed and
you were basically done [within UDP constranints]). The real limit
turned out to be the UDP max size, 64K in theory but with different
default/hard coded limits in various stacks. In 2005 a did a bit
research[2] and found that 4K seems to be the typical real-world limit.
But even many years before, allmost every Windows event record fit into
4K (the exception being records with dumps in them…). Also, we already
had plain TCP-based syslog at that time, which did not experience any
size issues.

So the practical solution for our Windows to syslog size problem was
simply to ignore it and tell customers how to configure/recompile their
syslogd. That worked on Windows at least for WinSyslog and Kiwi Syslog.
and on the *nix side at least for sysklogd, syslog-ng, some variants I
don’t remember by name and now rsyslog. We recommended to switch to a
product that supported larger sizes where the stock solution did not do
that. Or we used interim, specialised, receivers who logged data into
separate databases or files.

When I was unable to convince the syslog-sec WG to specify fragmentation
as part of the syslog protocol itself, I was at least able to put that
spirit into the I-D: so we now have the ability to use large sizes if
everybody configures the systems correctly. Part of that spirit, funny
as it may sound, is to place important information early in the packet
as this improves chances it will actually be delivered.

I have to admit this is not a perfect way to do it, but at least it
works if everything is setup up correctly. The current main “problem” is
that RFC 3195 (somewhat vaguely) sets an upper limit of 1K for messages
and also does not talk about truncation. So if there is a RFC3195 system
inside a relay chain, the maximum size for the whole chain goes down to
1K – there is nothing we can do about this. This is also the reason why
a new revision of 3195 is needed. This is underway, as far as I know.
One should also note that this limitation is of no practical importance
for the time being (thus no real “problem”), because 3195 did not find
widespread support. To the best of my knowledge, the only commercially
available implementations are Cisco’s and ours with us also providing
the only (more or less, due to low priority) fully supported 3195
implementation inside an open source syslogd. There was SDSC syslog, but
the project is to the best of my knowledge no longer alive. It also
never spread to become the default syslogd on any important Linux
distribution and can be considered “exotic” at best.

I hope this description is useful for you. The bottom line is that there
is no standard, and there is, at least was, no support for specifying
one. Even if we change that, the end-result will most probably not
support down-level reveivers below the 480 octet limit set forth in the
upcoming RFC series.

Best regards,
Rainer

[1] http://tools.ietf.org/html/draft-ietf-syslog-protocol-23
[2] http://www.monitorware.com/Common/en/articles/ihe-syslog.php