new rsyslog devel branch: 7.5

After the successful and important release of 7.4 stable, we are working hard on further improving rsyslog. Today, we release rsyslog 7.5.0, which opens up the new development branch.

With that branch, we’ll further focus on security. Right with the first release, we provide a much-often demanded feature: TLS support for the Reliable Event Logging Protocol (RELP). Even better, we also support the compression feature that is included in GnuTLS, so you can use im/omrelp not only with TLS, but also turn on compression if you like so.

There is already a guide for TLS-secured RELP with rsyslog available. It was written for the experimental 7.3.16 release, which never was officially announced. So the guide contains some (now-unnecessary) build steps, but also a full example of client and server configurations.

Note that the current TLS support supports anonymous authentication, only (via Diffie-Hellman key exchange). This is due to the current librelp implementation. However, librelp is scheduled to become more feature-rich and rsyslog will support this, once it is available. In general, you can expect some more enhancements (and more fine-grained config control for those in the need) for rsyslog’s RELP subsystem.

There are more things to come, especially in the security context. So stay tuned to what the next version of rsyslog will provide!

Use of application-level acks in RELP

I received a very well crafted question about RELP reliability via the rsyslog mailing list this morning. I think it makes perfect sense to highlight this question here in the blog instead of letting it die unread and hard to find in the mailing list archives. Before reading this post, it would be useful to read my rant on “On the unreliability of plain tcp syslog” if you have not already done so. It will greatly help understand the fine details of what the message talks about.

Here we go, original posters’s text in italics, my replies in between it:

In my research of rsyslog to determine its suitability for a
particular situation I have some questions left unanswered. I need
relatively-guaranteed delivery. I will continue to review the
available info including source code to see if I can answer the
questions, but I hope it may be productive to ask questions here.

In the documentation, you describe the situation where syslog silently
loses tcp messages, not because the tcp protocol permits it but
because the send function returns after delivering the message to a
local buffer before it is actually delivered.

But there is a more-fundamental reason an application-level ack is
required. An application can fail (someone trips over the power cord)
between when the application receives the data and when it records it.

1. Does rsyslog send the ack in the RELP protocol occur after the
message has been safely recorded in whatever queue has been configured
or forwarded on so its delivery status is as safe as it will get (of
course how safe depends upon options chosen), or was it only intended
to solve the case of TCP buffering-based unreliability?

RELP is designed to provide end-to-end reliability. The TCP buffering issue is just highlighted because it is so subtle that most people tend to overlook it. An application abort seems to be more obvious and RELP handles that.

HOWEVER, that does not mean messages are necessarily recorded when the ACK is sent. It depends on the configuration. In RELP, the acknowledgment is sent after the reception callback has been called. This can be seen in the relevant RELP module. For rsyslog’s imrelp, this means the callback returns after the message has been enqueued in the main message queue.

It now depends on how that queue is configured. By default, messages are buffered in main memory. So when rsyslog aborts for some reason (or is terminated by user request) before this message is being processed, it is lost – while the sender still got a positive ACK. This is how things are done by default, and it is useful for many scenarios. Of course, it does not provide the audit-grade reliability that RELP aims for. But the default config needs to take care of the usual use case and this is not audit-grade reliablity (just think of the numerous home systems that run rsyslog and should do so in the least intrusive way).

If you are serious about your logs, you need to configure the engine to be fully reliable. The most important thing is a good understanding of the queue engine. You need to read and understand the rsyslog queue docs, as they form the basis on which reliability can be built.

The other thing you need to know is your exact requirements. Asking for reliability is easy, implementing it is not. The more you near 100% reliability (which you will never reach for one reason or the other) the more complex scenarios get. I am sure the original post knows quite well what he want, but I am often approached by people who just want to have it “totally reliable” … but don’t want to spent the fortune it requires (really – ever thought about the redundant data centers, power plants, satellite and sea links et all you need for that?). So it is absolutely vital to have good requirements, which also includes of when loss is acceptable, and at what cost this comes.

Once you have these requirements, a rsyslog configuration that matches them can be designed.

At this point, I’d like to note that it may also be useful to consider rsyslog professional services as it provides valuable aid during design and probably deployment of a solution (I can’t go into the full depth of enterprise requirements here).

To go back to the original question: RELP has almost everything that is needed, but configuring the whole system in an audit-grade way requires (ample) work.

2. Presumably there is a client API that speaks RELP. Can it be
configured to return an error to the client if there is no ACK (i.e.
if the log it sent did not make it into the configured safe location
which could be on a disk-based queue), or does it only retry? Where is
this API?

The API is in librelp. But actually this is not what you are looking for. In rsyslog, an output module (here: omrelp) provides the status back to the caller. Then, configuration decides what happens. Messages may be discarded, sent to a different destination or retried.

With omrelp, I think we have some hardcoded ways to preserve the message, but I have no time yet to look this up in detail. In any case, RELP will not loose messages but may duplicate few of them (within the current unacked window) if the remote peer simply dies. Again, this requires proper configuration of the rsyslog components.

Even with that, you may loose messages if the local rsyslogd dies (not terminates, but dies for some unexpected reason, e.g. a segfault, kill -9 or whatever) but still has messages in a not persisted queue. Again, this can be mitigated by proper configuration, but that must be designed. Also, it is very costly in terms of performance. A good reading on the subtleties can be in the rsyslog mailing list archive. I suggest to have a look at it.

Certainly the TCP caching case you mention in your pages is one a user
is more likely to be able to reproduce, but that is all the more
reason for me to be concerned that the less-reproducible situations
that could cause a message to occasionally become lost are handled

I don’t think app-abort is less reproducable – kill -9 `cat /var/run/` will do nicely. Actually, from feedback I received, many users seem to understand the implications of a program/system abort. But far fewer understand the issues inherent in TCP. Thus I am focusing so much on the later. But of course, everything needs to be considered. Read the thread about the reliable queue (really!). It goes great lengths, but still does not offer a full solution. Getting things reliable (or secure) is very, very challenging and requires in-depth knowledge.

So I am glad you asked and provided an opportunity for this to be written :)


On the (un)reliability of plain tcp syslog…

Did you ever use TCP to transfer syslog reliably? And do you think that makes you immune against message loss? If so, it’s time to think again.

There is a subtle, yet important problem with plain tcp syslog. It is a very simple protocol and it works without any app-level acknowledgment. Well, “why care”, you may say “we have the TCP ack”. That, of course is right, but that low-level ack won’t help you in all cases. The TCP ack is fine while everything goes well. But if the connection breaks, there is no mechanism in TCP that tells the sender immediately. In fact, “not immediately” actually is “after [2] hours” (and only if you have keep alive active, else its even later). This is not a flaw in TCP but desired behavior. To be honest, when the server aborts and is restarted, the client will see an error as soon as the server is restarted. On a busy system, this may still be many messages later. So when the client receives the error status, it has no clue at all which messages were lost and which finally made it to the server.

Why that? The problem here is that (by design!), the TCP send() API always returns success but buffers the message locally until it can be transmitted to the remote host. This behavior is fine – it enables TCP to be used reliably even when a network path is temporarily down. The downside is that the sender never knows if and when the data was received by the remote peer. Of course, there is an upper limit on the buffer size. It’s depending on TCP window negotiations, and if I remember correctly the ultimate upper limit is around 240K. I haven’t checked, but the lower limit is probably around 1.5K, the size of an Ethernet packet.

If we now assume a generous syslog message size of 150 bytes (many are much smaller), we can have between 10 and 1,600 messages sitting in this buffer! So we may lose up to 1,600 messages if the server goes down – without even noticing it. Frightening, isn’t it?

I was aware of this problem since the early days of plain tcp syslog. I have even documented it as part of the SELP (simple event logging protocol) discussion on the loganalysis mailing list: see the selp spec, section 2.4. I have to admit that I never took it too seriously. After all, it only happens when the server goes down … and there were so many other occasions on which syslog messages could get lost. The most prominent one probably running out of buffer space on the local syslogd or simply restarting it.

Then, I begun to implement more and more reliability into rsyslog (my *nix syslogd implementation for those not in the know). And the more reliable that engine got, the more the reliability problems of plain tcp syslog surfaced. A typical sample of TCP unreliability is included in this post, which also was the thing that finally reminded me on the root cause of all that (if you search the forum, you’ll find some other example of “the tcp problem”).

Once the core engine was reliable, the unreliability of TCP, so far well-hidden by the unreliability of the rest of the system, quickly surfaced. No advanced flow control, no tricks, no nothing helped. We can not build a reliable solution out of plain tcp syslog. It’s simply a no-go. At this point, it is probably good to add that this is not a rsyslog problem. It’s a protocol issue and as such all softwares implementing plain TCP syslog have the same shortcoming!

My quest for a highly reliable and tamper-proof logging system got shaky at that point. So what to do? Of course, a different, more reliable protocol was needed. It must support app-level acks, so that the client (sender) knows what was processed by the server and what not. If so, it can resend the right messages in case of a connection failure. My initial thoughts were to use liblogging and thus RFC 3195, which bases on BEEP (RFC 3080/3081). However, after some thinking, I came to the conclusion that RFC 3195 is far too less accepted and quite a bit to complicated for the job. I ended up designing a new, reliable and extensible protocol which I called RELP in honor of the initial SELP work. RELP stands for reliable event logging protocol. That implies two things: it’s reliable and it’s NOT only about syslog. RELP can carry a variety of payloads and will probably do so in the future. [You can find some more of my thought process in a previous blog post.]

I have now finished the initial alpha implementation of librelp, the RELP core library. I have also released the first version of rsyslog (3.15.0) with RELP support. There are still a few nits with RELP. Most importantly, it may duplicate some messages in extreme network failure cases (documented inside rsyslog’s imrelp module). But, except for a bug, RELP will never again lose any message … nearly. Now the ball is back to rsyslog. If the network connection breaks and rsyslog sent some yet-unacknowledged messages just before the network connection broke and rsyslog is terminated in that state … then we may indeed still lose a handful of messages. The bad thing is that it happens, the good thing is that there is some cure, it “just” needs to be implemented. So the reliability quest is not over. It just went into the next round. But, again, the probability of message loss is now very much lower than with plain tcp syslog.

So just let me repeat: plain tcp syslog is *not* a reliable solution. It works well as long as everything is working well, but it screws up when the network or the server breaks… And, yes, you don’t really need to care about that until you try to make the rest of the system fully reliable.


I have received a good comment via mail and provided an in-depth answer, probably interesting to others, too. So I here reproduced it (but snip much of the original mail):

> There is very expensive software out there that can do guarenteed
> packet
> deliver over tcp…

Well… RELP can *really* do the trick. It needs some help in the rsyslog engine to work under all cases, that is when the relp stack is terminated (when the client rsyslog is shut down). But that isn’t rocket science and has just been pushed back by more urgent work (securing the channel).

So far, relp does single ack, that is the server ack’s every packet back to the client. Client discards packets only after ack. So when a session breaks, we know what the server has processed. If, however, some of the acks get lost, we resend packets that the server already processed, resulting in some mild message *duplication* (we currently have a 128 packet app-layer max window and typically acks come in rather quickly). To work around this, the client must ack the server’s acks (double-acking). That sounds scary, but is not. It’s also not performance intense and the protocol is modeled that those with ultra-slim bandwidth can turn it off and live with the message duplication scenario. If double-acks are active, relp client and server will go into a recovery phase after reconnect negotiation, in which mutally acked package numbers are exchanged. To do so, the client must provide a session cookie back to the server. This is a potential attack vector and thus I’d like to have secure transport in place before I do it. Once the recovery negotiation is done, client and server know, and have discarded, what each other peer has processed. The remaining packets are re-sent, but under the same reliability settings. So if the connection breaks at this point again (or during the renegotiation), we’ll simply go into another recovery phase until we finally succeed. It is of course important that session caches be persisted to disk when an engine stops – that will be part of rsyslog. The recovery procedure itself is in librelp. rsyslog needs to add just a slim persistence capability.

Once this is done, and with proper rsyslog queue settings, sufficiently stable hardware and sufficient disk space, I guarantee (not in a lawyers sense, though ;)) that you’ll never lose a message nor get a duplicate.

This is when I think we have achieved our reliability goal. If all goes well… summer? ;)

ANOTHER UPDATE (2008-05-29)

Some idea was brought up about one may be able to circumvent this problem. I tried it out, saw it failed and could also prove that it is simply impossible to have a reliable plain tcp syslog protocol without application-level acknowledgment.

RELP – the reliable event logging protocol

Please welcome RELP, the “reliable event logging protocol”. What’s that???

I am currently really bugged by shortcomings in the plain tcp syslog protocol in its current form. And I finally made up my mind. Instead of wasting time on fixing broken plain tcp syslog transport mode (e.g. by implementing half-duplex, which isn’t standard anyhow), I’ll do “the right thing”. I thought rfc 3195 is the right thing. But it carries a lot of overhead that I really don’t need. And, most importantly, any standard additions takes ages to go through the IETF (I know what I am talking about, have finally succeeded to get a better syslog rfc though it in “just” 4 (or 5?) years — and it is still waiting to be published…).

So instead of adding on any of these existing protocols, I’ll do a new, lightweight but capable protocol specifically designed to solve the shortcomings we currently experience. Please welcome RELP, the “reliable event logging protocol” (name based on the quite successful selp [simple event logging protocol] effort:

Relp will evolve over time. I hope to do something useful relatively soon, and it will be extended as the project progresses. The ultimate goal is to have a good, very reliable, protocol for rsyslog-to-rsyslog communications. I’ll don’t care about the outside world, so I can do the best of breed solution. For the rest of the world, rsyslog will of course continue to support plain tcp syslog and will get better support for rfc 3195. But if you wanna go hardcore on high-reliability, high-performance event logging, relp will be your choice.

Technically, I’ll split this off into rsyslog relp input and output plugins AND an independent librelp, which provides core protocol functionality (just in case somebody else wants to support it in the long term, so this will not be tied to rsyslog).