docker group security risk

The Docker doc spells out that there are security concerns of adding a user to the docker group. Unfortunately, they do not precisely give what the concern is. I guess that is a “security-by-obscurity” approach trying to avoid bad things. Practice show this isn’t useful: the bad guys know anyways, and the casual user has a bad time understanding the actual risk involved.

It is considerable, so let me explain at least one risk (I have not tried exhaustively check security issues):  The containers are usually defined to run as root user. This permits you to bypass permission checks on the host.

Let’s assume a $USER is inside the docker group and otherwise has just installed docker. So he can run

$ docker run -v/etc:/malicious -ti –rm alpine
# cd /malicious
# vi sudoers
…. edit, write …
# ctl-D

As such, the user can modify system config that he could not access otherwise. It’s a real risk. If you have a one-person “personal” machine/VM where the user has sudo permissions in any case … I’d say it’s no real issue.

The story is a different one on e.g. a CI machine.  It’s easy to inject bad code into public pull requests, and so it’ll run on the CI platform. Usually (before spectre/meltdown…), this was guarded by the (low) permissions of the CI worker user (if you run CI with a sudo-enabled user … nothing changed). When you enable it to use docker, you now get this new class of attack vector. Don’t get me wrong: I do NOT advocate against using docker in CI. Right the opposite, it’s an excellent tool there. I just want to make you aware that you need to consider and mitigate another attack vector.

Feedback Request for digitally signed log store

I have just written about how I plan to implement digital signatures in LogStore, the secure store used by LogTools. The log store digital signature proposal details how and when signatures are written and provides reasoning why it will probably happen this way. There are two goals for the proposal: one is to document how things will work and the other, probably more important, one is to draw some feedback. It is easy to get security tools wrong, and even those with highest experience in that area (which I have not!) can fail. So it would be very beneficial to have some other folks read the proposal and comment on weaknesses they find – or simply things they would do differently or add to the overall idea. With that said, please read the (small) paper and provide feedback ;-).

Please keep on your mind that his is not only related to syslog but can  be used with any text-based log (including binary logs that are converted to text, e.g. by base64 encoding them). So it can affect you even if you are not interested in syslog itself. My (mostly uneducated) assumption is that this could be a toolset of great use for computer forensics.

Serious syslog problems?

In the paper introducing journald/Linux Journal a number of shortcommings in current syslog practice are mentioned. The authors say:

Syslog has been around for ~30 years, due to its simplicity and ubiquitousness it is an invaluable tool for administrators. However, the number of limitations are substantial, and over time they have started to be serious problems:

I have now taken some time to look at each of these claims in depth. But before I start, I need to tell that I am working in the IT logging field for nearly 15 years, have participated in a number of standards efforts and written a lot of syslog-related software with rsyslog being a prime example (some commercial tools I have been involved with can be found here). So probably I have a bias and my words need to be taken with a grain of salt. On the other hand, the journald authors also have a bias, so I guess that’s a fair exchange of arguments ;). 

In my analysis, I compare the journald effort with what rsyslog currently provides and leave closed source software out. It is also important to note that there is a difference between syslog, the protocol, a specific syslog application (like rsyslog) and a system log message store. Due to tradition, these terms are often used for different things and one must deduce from context, what is meant. The paper applies the same sloppiness in regard to terms. I use best effort to extract the proper meaning. I quote the arguments as they originally appeared inside the paper. However, I rearrange them a bit in order to put related things closer together. I retain the original numbering so that you can compare to the original paper. I also tried to be similar brief with my arguments. Now proof-reading the post, I see that I failed with that. Sorry, but that’s as brief as I can provide serious counterargument. I broadly try to classify arguments in various levels of “True” vs “Wrong”, so you may take this as an ultra-short reply. 

So let’s start with Arguments related to the log storage system. In general, the paper is right that there is no real log storage system (like, for example, the Windows Event Log). Keeping logs only in sequential text files definitely has disadvantages. Syslog implementations like rsyslog or syslog-ng have somewhat addressed this by providing the ability to use databases as storage backends (the commercial syslog-ng fork also has a proprietary log store). This has some drawbacks as well. The paper proposes a new proprietary indexed syslog message store. I kind of like this idea, have even considered to write something like this as an optional component for rsyslog (but had no time yet to actually work on it). I am not convinced, though, that all systems necessarily need such a syslog storage subsystem.

With that said, now let’s look at the individual arguments:

5. Reading log files is simple but very inefficient. Many key log operations have a complexity of O(n). Indexing is generally not available.

True. It just needs to be said that many tools inside the tool chain only need sequential access. But those that need random access have to pay a big price. Please note, however, that it is often only necessary to “tail” log files, that is act on the latest log entries. This can be done rather quickly even with text files. I know both the problems and the capabilities, because Adiscon LogAnalyzer, in which I am involved, is a web-based analysis and reporting tool capable of working on log files. Paging is simple, but searching is slow with large files (we recommend databases if that is often required). Now that I write that, a funny fact is that one of the more important reasons for creating rsyslog was that we were unhappy with flat text files (see rsyslog history doc). And so I created a syslogd capable of writing to databases. Things seem to be a bit cyclic, though with a different spin ;)

8. Access control is non-existent. Unless manually scripted by the administrator a user either gets full access to the log files, or no access at all.

Mostly True and hard to make any argument against this (except, of course, if you consider database back ends as log stores, but that’s not the typical case).

10. Automatic rotation of log files is available, but less than ideal in most implementations: instead of watching disk usage continuously to enforce disk usage limits rotation is only attempted in fixed time intervals, thus leaving the door open to many DoS attacks.

Partly True, at least in regard to current practice. Rsyslog, for example, can limit file sizes as they are written (“outchannel action”), but this feature is seldomly used and due to be replaced by a better one. The better one is partly implemented but received no priority because nobody in the community flagged this as an urgent requirement. As a side-note: Envision that journald intends to shrink the log and/or place stricter restrictions on rate-limiting when disk space begins to run low. If I were an attacker, I would simply begin to fill the disk then, and make journald swipe out the log store for me.

11. Rate limiting is available in some implementations, however, generally does not take the disk usage or service assignment into account, which is highly advisable.

It needs to be said what “rate limiting” means. I guess it means preventing an application from spamming the logs with frequently repeated messages. This feature is available  in rsyslog. It is right that disk usage is not taken into account (see comment above on implications). I don’t know what “service assignment” means in this context, so I don’t comment on that one. Rate limiting is more than run-away or spamming processes. It is a very complex issue. Rsyslog has output rate limiting as well, and much more is thinkable. But correct, current rate limiting looks at a number of factors but not the disk assignment. On the other hand, does that make sense, if e.g. a message is not even destined to go to the disk?

12. Compression in the log structure on disk is generally available but usually only as effect of rotation and has a negative effect on the already bad complexity behaviour of many key log operations.

Partly True. Rsyslog supports writing in zip format for at least one and a half year (I am too lazy to check the ChangeLog). This provides huge savings for those that turn on the feature. Without doubt, logs compressed in this way are much harder to process in real-time.

7. Log files are easily manipulable by attackers, providing easy ways to hide attack information from the administrator

Misleadingly True. If thinking of a local machine, only, this is true. However, all security best practices tell that it is far from a good idea to save logs on a machine that is publicly accessible. This is the reason that log messages are usually immediately sent do some back end system. It is right that this can not happen in some setup, especially very small ones.

My conclusion on the log store: there definitely is room for improvement. But why not improve it within the existing frameworks? Among others, this would have the advantage that existing methods could be used to decide what needs to be stored inside the log store. Usually, log contain noise events that administrators do not want to log at all, because of the overhead associated with them. The exists best practices for the existing tool chain on how to handle that.

Now on to the other detail topics:

1. The message data is generally not authenticated, every local process can claim to be Apache under PID 4711, and syslog will believe that and store it on disk.

9. The meta data stored for log entries is limited, and lacking key bits of information, such as service name, audit session or monotonic timestamps.

Mostly wrong. IMHO, both make up a single argument. At the suggestion of Lennart Poettering, rsyslog can force the pid inside the TAG to match the pid of the log message emitter – for quite a while now. It is also easy to add additional “trusted properties”. I made an experimental implementation in rsyslog yesterday. It took a couple of hours and the code is available as part of rsyslog 5.9.4. As a side-note, the level of “trust” one wants to have in such properties needs to be defined – for truly trusted trusted properties some serious cryptography is needed (this is not specified in the journald proposal nor currently implemented in rsyslog).

2. The data logged is very free-form. Automated log-analyzers need to parse human language strings to a) identify message types, and b) parse parameters from them. This results in regex horrors, and a steady need to play catch-up with upstream developers who might tweak the human language log strings in new versions of their software. Effectively, in a away, in order not to break user-applied regular expressions all log messages become ABI of the software generating them, which is usually not intended by the developer.

Trivial (I can’t commit myself to a “True” or “Wrong” on such a trivial finding). Finally, the authors have managed to describe the log analysis problem as we currently face it. This is not at all a syslog problem, it is problem of development discipline. For one, syslog has “solved” this issue with RFC5424 structured data. Use it and be happy (but, granted, the syslog() API currently is a bit problematic). The real problem is the missing discipline. Take, for example, the Windows Event Log. The journald proposal borrows heavily on its concepts. In Windows Event Log, there is a developer-assigned unique ID within the application’s reserved namespace available. The combination of both app namespace (also automatically created) and ID together does exactly the same thing as the proposed UUID. In Windows Event Log, there are also “structured fields” available, but in the form of an array (this is a bit different from name-value pairs but far from totally different). This system has been in place since the earliest versions of Windows NT, more than 15 years ago. So it would be a decent assumption that the problem described as a syslog problem does not exist in the Windows world, right (especially given the fact that Windows purposefully does not support syslog)? Well, have a look at the problems related to Windows log analysis: these are exactly the same! I could also offer a myriad of other samples, like WELF, Apache Log Format, … The bottom line is that developer discipline is not easy to achieve. And, among others, a taxonomy is actually needed to extract semantic meaning from the logged event. It probably is educating to read the FAQ for CEE, a standard currently in development that tries to somewhat solve the logging mess (wait a moment: before saying that CEE is a bunch of clueless morons, please have a look at the CEE Board Members first).

3. The timestamps generally do not carry timezone information, even though some newer specifications define support for it.

Partly Wrong. High-Precision timestamps are available for many years and default in rsyslog. Unfortunately, many distros have turned them off, because they break existing tools.  So in current practice this is a problem, but it could be solved by deleting one line in rsyslog.conf. And remember that if that causes trouble to some “vital” tool, journald will break that tool even more. Note that some distros, like Gentoo, already have enabled high precision timestamps.

4. Syslog is only one of many log systems on local machines. Separate logs are kept for utmp/wtmp, lastlog, audit, kernel logs, firmware logs, and a multitude of application-specific log formats. This is not only unnecessarily complex, but also hides the relation between the log entries in the various subsystems.

Rhetorically True – but what why is that the failure of syslog? In fact, this problem would not exist if developers had consistently used syslog. So the problem is not rooted in syslog but rather in the fact that syslog is not being used. Lesson learned: even if standards exist, many developers simply ignore them (this is also an interesting argument in regard to problem number #2, think about it…).

13. Classic Syslog traditionally is not useful to handle early boot or late shutdown logging, even though recent improvements (for example in systemd) made this work.

True – including that fact that systemd already solved that problem.

14. Binary data cannot be logged, which in some cases is essential (Examples: ATA SMART blobs or SCSI sense data, firmware dumps).

Wrong, the short answer is: it can be logged, but must be properly encoded. In the IETF syslog working group we even increased the max message sizes for this reason (actually, there is no hard limit anymore).

The longer, and more correct, answer is that this is a long-standing discussion inside the logging world. Using that view, it is hard to say if the claim is true or false; it often even is argued like being a religion. Fact is that the current logging toolset does not work well for binary data (even encoded). This is even the case for the Windows Event Log, which supports binary data. In my view, I think most logging experts lean towards the side that binary data should be avoided and, if unavoidable, must be encoded in a text-friendly way. A core problem with the usefulness of binary data is that it often is hard to decode, and even more to understand, on the non-native platform (remember that the system used during analysis is often not the system where the event was initially recorded).

6. The syslog network protocol is very simple, but also very limited. Since it generally supports only a push transfer model, and does not employ store-and-forward, problems such as Thundering Herd or packet loss severely hamper its use.

Wrong,  missing all improvement made in the past ten years. There is a new RFC series which supports TLS-secured reliable transmission of syslog messages and which permits to place fine-grain access control on who can talk with whom inside a relay chain. UDP syslog is still available and is so for good reason. I cannot dig into the details here, part of that reasoning is on the same grounds why we use audio more often over UDP than TCP. Using UDP syslog is strictly optional and there are few scenarios where it is actually needed. And, if so, the “problem” mentioned is actually a “solution” to a much more serious problem not even mentioned in the journald paper. For a glimpse at these problems, have a lock at my blog post on the “reliability problem”. Also, store-and-foward is generally available in rsyslog via action queues, failover handling and a lot of other things. I admit that setting up a complex logging system, sometimes requires an expert. On the “loss issue”, one may claim that I myself say that plain TCP syslog is not totally lossless. That claim is right, but the potential loss Window is relatively small. Also, one can use different protocols that solve the issue. In rsyslog, I introduced proprietary RELP for that very reason. There is also completely lossless RFC3195, which is a great protocol (but without future). It is supported by rsyslog, but only extremely few other projects implement it. The IETF (including me) assumes that RFC3195 is a failure – not from technical fact but in the sense that it was too far away from the usual logging practice to be picked up by enough folks. [Just to avoid mis-intepretation: contrary to RFC3195, RELP is well alive, well-accepted and in widespread use. It is only RFC3195 that is a failure.]

Concluding my remarks, I do not see anything so broken in syslog that it can only be fixed by a total replacement of technology. Right contrary, there is a rich tool set and expertise available. Existing solutions, especially in the open source world, are quite modular and can easily be extended. It is not even necessary to extend existing projects. A new log store, for example, could also be implemented by a new tool that imports a decent log format from stdin to a back end data store. That would be easily usable not only from rsyslog but from any other tool that is part of the current log tool chain. For example, it may immediately consume Apache or other application logs (of course, such a tool would require proper cryptography to be used for cryptographic tasks…). There is also need for a new logging API – the catch-all syslog() call is clearly insufficient (an interesting detail fact is that journald promises to retain syslog() as a first-class logging interface — that means journald can solve none of the issues associated with that API, especially in regard to claim #2).

So extending existing applications, or writing new ones that tightly integrate into the existing toolset is the right thing to do. One can view journald as such an extension. However, this extension is somewhat problematic as its design document tells that it intends to replace the whole logging system. Especially disturbing is that the reasoning, as outlined above, essentially boils down to a new log store and various well-known mostly political problems (with development discipline for structured formats right at the top of them). Finally, the proposal claims to provide more security, but fails to achieve at least the level that RFC5848 syslog is able to provide. Granted, rsyslog, for example, does not (yet) implement RFC5848. But why intends journald to implement some home-grown pseudo security system when a standard-based method designed by real crypto experts is available? I guess the same question can be applied to the reasoning for the journald project at large.

Let me conclude this posting with the same quote I started with:

Syslog has been around for ~30 years, due to its simplicity and ubiquitousness it is an invaluable tool for administrators. However, the number of limitations are substantial, and over time they have started to be serious problems:

Mostly Wrong. But it is true that syslog is an invaluable tool,especially in heterogeneous environments.

US Citizen? Your credit is in doubt…

I was introduced to a very subtle effect of the Heartland breach. Remember, card processor Heartland has screwed up and, as some sources say, 100 million credit card numbers were stolen from them via a Trojan. That fact spread big news and, among others, started a discussion if PCI has been proven to be useless. But there seem to be additional effects: US customers seem to have lost a lot of credibility in international shopping.

In Adiscon’s office, I heard today that we got a call from one of our card processors. Keep in mind that we are based in Germany. The card processor inquired about a recent transaction and asked us to check whether this could be credit card fraud. It was not, but he left us his phone number so that we could check with him in the future when we suspected fraud on transactions.

This is quite unusual and immediately drew my attention. I gave that guy a call. He explained that they are routinely checking US credit card transactions because some problems have been seen recently with US cards. He explained to me that the processor would like to protect merchants, because “if you ship the goods and the cardholder protests the charge … weeks later … you will be charged back but unable to recover the goods” (good point, btw). So I came up and asked if they were calling because of the Heartland breach. Not only, he said, but that would be an example (I deciphered this as a “yes”). So then I asked if they had not blacklisted the affected card numbers. Some statements followed, which I deciphered to mean “no”. So the cards are still active and seem to cause issues (why else would a card processor begin to call its merchants?).

I know that heartland does not know exactly which card numbers have been stolen. But it is known that most probably any card processed within the past 10 month is highly suspect. So wouldn’t it have been fair security practice to put these cards on the blacklist and issue new ones to the cardholders? Sure, that would be inconvenient (read: costly) and, probably more important, would have shown to everyone that someone has screwed up, but would that not be much better than putting both consumer and vendors at risk? Without an automatic blacklisting, consumers need to pay much more attention to their credit card bill.

An interesting side-effect is that US customers seem to have lost credit outside of the US. For example, it was suggested to me that we check each US-order in depth before delivering anything. If everyone else gets this advise, US customer’s will probably find shopping overseas quite inconvenient…

If you loose your credit card, you are legally required to call your card issuer and report that loss. As long as you do not notify them, you are liable. If, on the other hand, someone in the card industry looses your card (number), nobody seems to be liable: Customers must check their statements and vendors must do in-depth checks (sigh) on their customers. Is this really good practice?

And what if a card is used to commit credit card fraud? No problem at all (for the card industry): either the cardholder will not notice it (and pay the fraud) or the cardholder protests the charge, in which case the merchant needs to pay. The later case involves some manual processing by the card industry: again, no problem! The merchant is charged a hefty protest fee. Looking at how hefty the fee is, it seems to be even profitable for the card industry if it takes that route.

Bottom line: who is responsible? Card industry (Heartland in this case). Who pays? Everyone else! Isn’t that a nice business model? Where is the motivation to keep such a system really secure?

I think that really questions if the card industry is interested in security. PCI may not have failed (I tend to agree to Anton Chuvakin here). But it smells a bit like PCI and whetever else efforts can not succeed, because they are not deployed in a honestly security-aware environment but rather in one that needs good execuses for sloppy security. As long as the card industry does not do the right thing as soon as it costs the card industrie’s money, real security can not be achieved.

Wanna play? No, says the DRM!

Do you like DRM? Isn’t that a perfect thing to make sure you are properly licensed with all your music, movies and, of course, software? Well, folks like the EFF have strongly opposed DRM right from the beginning. One of their arguments always has been that, if thought to the end, would revoke the user the ability to work do with his machine what he wants do.

Now we see a perfect sample. Grave Rose just posted a nice link on twitter: “Gears of War DRM screwup makes PC version unplayable“. It’s all about a DRM cert that seems to have expired with the end result that the game no longer works. Thankfully, as we do not (yet) have the full trusted computing platform in place. So, you still change you PC. This enabled users to set back their system clocks and so the game worked again. rofl…

Granted, this is not a real DRM issue. Such an expiration date can be encoded in software ever since. With a good debugger, it is not too hard to remove it (of course, that’s not legal and with DRM it is considerably more work to do…). But if we are forced to use more and more DRM and if we are forced to use hardware platforms that deny true admin access to its owner and we have legislation that outlaws helping yourself – won’t those issues become the norm.

For most of the time, you could rest assured that once you had installed something, and did not change it, it was likely to run for eternity (well… somewhat). This seems no longer to hold true. The only true solution is to use as much open source as possible and say no to any DRM-enabled products.

As an interesting side-note, I am not sure if the poor gamers who set back their system clocks are in legal troubles: didn’t they try to circumvent a technical copy protection? Not sure about the DMCA, but in Germany you could argue that this is an illegal attack… Happy gaming!

NASA list server compromised?

As a space geek, I am subscribed to NASA’s HSFNEWS mailing list. When I looked at my mailbox this morning, a spam message that claimed to have been posted via the Nasa list server caught my attention. Obviously, it is quite easy to forge email and so I thought that this may be a fake, too. However, closer examination reveals headers that makes me think this could be a real thing.

Of course, HSFNEWS is just one of the many mailing lists NASA offers and also of course it is run on an auxiliary system, invalid messages slipping through can have quite bad effects. Of course, a message with subject

“[HSFNEWS] She’ll always want to give head now”

will hopefully immediately classified as spam by anyone (or do you think the message is about alien encounters? ;)). But what if the message would be much more carefully crafted to carry out something evil? After all, the message could look much like it comes from an official NASA source. Just think about the various Obama hoaxes and scams that we have seen lately?

I am still not 100% convinced that the mail actually originated from the NASA list server (I have tried to contact someone in charge over there and hope to get some results). To help you get an idea yourself, here is the complete message source, except a few things on my local delivery record as well as valid mail addresses that do not need to be posted here.

If someone has an opinion if the mail was run over NASA’s server, please post a comment or drop me a mail.


MIME-Version: 1.0
Content-Type: multipart/alternative; boundary=”—-_=_NextPart_001_01C97B42.7A7F7080″
Received: from jsc-listserv-01.jsc.nasa.gov (jsc-listserv-01.jsc.nasa.gov
[128.157.5.25]) by mailin.adiscon.com (Postfix) with ESMTP id 06205241C002
for ; Tue, 20 Jan 2009 21:52:51 +0100 (CET)
X-MimeOLE: Produced By Microsoft Exchange V6.5
Received: from jsc-listserv-01.jsc.nasa.gov (jsc-listserv-01
[128.157.5.25]) by jsc-listserv-01.jsc.nasa.gov (8.13.1/8.13.1) with ESMTP
id n0K7cgeV024815; Tue, 20 Jan 2009 15:01:22 -0600
Received: by JSC-LISTSERV-01.JSC.NASA.GOV (LISTSERV-TCP/IP release 15.0)
with spool id 553828 for HSFNEWS@JSC-LISTSERV-01.JSC.NASA.GOV;
Tue, 20 Jan 2009 15:01:20 -0600
Received: from 200-127-202-12.cab.prima.net.ar
(200-127-202-12.cab.prima.net.ar [200.127.202.12]) by
jsc-listserv-01.jsc.nasa.gov (8.13.1/8.13.1) with ESMTP id
n0KKPY2D029413 for ; Tue, 20 Jan
2009 14:25:35 -0600
Return-Path:
X-OriginalArrivalTime: 20 Jan 2009 21:03:01.0983 (UTC)
FILETIME=[7B156EF0:01C97B42]
List-Owner:
Approved-By: {removed}@NASA.GOV
Content-class: urn:content-classes:message
Subject: [HSFNEWS] She’ll always want to give head now
Date: Tue, 20 Jan 2009 21:25:34 +0100
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Thread-Topic: [HSFNEWS] She’ll always want to give head now
Thread-Index: Acl7Qns+aZRN9mnKS56dl4osL2myOw==
List-Help: ,

List-Subscribe:

List-Unsubscribe:

From: “joynt”
To:
Reply-To: “hsfnews”

——_=_NextPart_001_01C97B42.7A7F7080
Content-Type: text/plain; charset=”iso-8859-1″
Content-Transfer-Encoding: quoted-printable

Can’t see images?
To view this email as a web page, go here =
{actual spam removed}

Use of application-level acks in RELP

I received a very well crafted question about RELP reliability via the rsyslog mailing list this morning. I think it makes perfect sense to highlight this question here in the blog instead of letting it die unread and hard to find in the mailing list archives. Before reading this post, it would be useful to read my rant on “On the unreliability of plain tcp syslog” if you have not already done so. It will greatly help understand the fine details of what the message talks about.

Here we go, original posters’s text in italics, my replies in between it:

In my research of rsyslog to determine its suitability for a
particular situation I have some questions left unanswered. I need
relatively-guaranteed delivery. I will continue to review the
available info including source code to see if I can answer the
questions, but I hope it may be productive to ask questions here.

In the documentation, you describe the situation where syslog silently
loses tcp messages, not because the tcp protocol permits it but
because the send function returns after delivering the message to a
local buffer before it is actually delivered.

But there is a more-fundamental reason an application-level ack is
required. An application can fail (someone trips over the power cord)
between when the application receives the data and when it records it.

1. Does rsyslog send the ack in the RELP protocol occur after the
message has been safely recorded in whatever queue has been configured
or forwarded on so its delivery status is as safe as it will get (of
course how safe depends upon options chosen), or was it only intended
to solve the case of TCP buffering-based unreliability?


RELP is designed to provide end-to-end reliability. The TCP buffering issue is just highlighted because it is so subtle that most people tend to overlook it. An application abort seems to be more obvious and RELP handles that.

HOWEVER, that does not mean messages are necessarily recorded when the ACK is sent. It depends on the configuration. In RELP, the acknowledgment is sent after the reception callback has been called. This can be seen in the relevant RELP module. For rsyslog’s imrelp, this means the callback returns after the message has been enqueued in the main message queue.

It now depends on how that queue is configured. By default, messages are buffered in main memory. So when rsyslog aborts for some reason (or is terminated by user request) before this message is being processed, it is lost – while the sender still got a positive ACK. This is how things are done by default, and it is useful for many scenarios. Of course, it does not provide the audit-grade reliability that RELP aims for. But the default config needs to take care of the usual use case and this is not audit-grade reliablity (just think of the numerous home systems that run rsyslog and should do so in the least intrusive way).

If you are serious about your logs, you need to configure the engine to be fully reliable. The most important thing is a good understanding of the queue engine. You need to read and understand the rsyslog queue docs, as they form the basis on which reliability can be built.

The other thing you need to know is your exact requirements. Asking for reliability is easy, implementing it is not. The more you near 100% reliability (which you will never reach for one reason or the other) the more complex scenarios get. I am sure the original post knows quite well what he want, but I am often approached by people who just want to have it “totally reliable” … but don’t want to spent the fortune it requires (really – ever thought about the redundant data centers, power plants, satellite and sea links et all you need for that?). So it is absolutely vital to have good requirements, which also includes of when loss is acceptable, and at what cost this comes.

Once you have these requirements, a rsyslog configuration that matches them can be designed.

At this point, I’d like to note that it may also be useful to consider rsyslog professional services as it provides valuable aid during design and probably deployment of a solution (I can’t go into the full depth of enterprise requirements here).

To go back to the original question: RELP has almost everything that is needed, but configuring the whole system in an audit-grade way requires (ample) work.

2. Presumably there is a client API that speaks RELP. Can it be
configured to return an error to the client if there is no ACK (i.e.
if the log it sent did not make it into the configured safe location
which could be on a disk-based queue), or does it only retry? Where is
this API?


The API is in librelp. But actually this is not what you are looking for. In rsyslog, an output module (here: omrelp) provides the status back to the caller. Then, configuration decides what happens. Messages may be discarded, sent to a different destination or retried.

With omrelp, I think we have some hardcoded ways to preserve the message, but I have no time yet to look this up in detail. In any case, RELP will not loose messages but may duplicate few of them (within the current unacked window) if the remote peer simply dies. Again, this requires proper configuration of the rsyslog components.

Even with that, you may loose messages if the local rsyslogd dies (not terminates, but dies for some unexpected reason, e.g. a segfault, kill -9 or whatever) but still has messages in a not persisted queue. Again, this can be mitigated by proper configuration, but that must be designed. Also, it is very costly in terms of performance. A good reading on the subtleties can be in the rsyslog mailing list archive. I suggest to have a look at it.

Certainly the TCP caching case you mention in your pages is one a user
is more likely to be able to reproduce, but that is all the more
reason for me to be concerned that the less-reproducible situations
that could cause a message to occasionally become lost are handled
correctly.


I don’t think app-abort is less reproducable – kill -9 `cat /var/run/rsyslog.pid` will do nicely. Actually, from feedback I received, many users seem to understand the implications of a program/system abort. But far fewer understand the issues inherent in TCP. Thus I am focusing so much on the later. But of course, everything needs to be considered. Read the thread about the reliable queue (really!). It goes great lengths, but still does not offer a full solution. Getting things reliable (or secure) is very, very challenging and requires in-depth knowledge.

So I am glad you asked and provided an opportunity for this to be written :)

Rainer

Strong passwords? Forbidden!

American Express, as a bank and card issuer should be a fairly security sensitive company. Right? Well, it looks like they have not yet learned their lesson. Occasionally, I log in to my AmEx account to gain access to memebership rewards (these nice gimmicks that shall trick you into charging to AmEx as much as possible). I tend to have my credentials not at hand when doing so, but thankfully AmEx has a quite secure system to recover your credentials.

What really bugs me is their password requirement. A password can have a maximum of 8 characters and consist only of letters and numbers! Ouch… what about strong passwords? They are simply forbidden by AmEx. The funny thing is that the web site doesn’t even complain when you enter a too-strong (aka longer or alphanumeric) password. It simply ignores the extra characters. Some time last year this drove me crazy as I could not log in after changing my password. Guess what, I used a too strong one and of course it didn’t match to what the system generated. I called customer service and also complained about being forced to use insecure passwords. That was several month ago.

New year, new try – old problem… Nothing learned, still 8 chars max and only letters and number. Frankly, AmEx, who is advising you on security? I really wonder if under US law AmEx is responsible if someone breaks into my account. I think they should…

Thailand is going syslog…

I found an interesting read in “The Nation”, one of Thailand’s largest business dailies. They talk about the economic crisis and the way Thailand plans to reduce negative effects. There is a 5-point initiative in place. Of interest for us the the fifth and final point:

Finally, the association will focus on security, which promises to be this year’s main technology trend. It will urge software companies to become more familiar with Syslog, which is a standard for forwarding log messages in an IP network, but is also typically used for computer system management and security auditing.

So, as it looks, Thailand is betting on security. This is obviously a good movement. Interestingly, they seem to have identified logging, and syslog in specific, to be a major building block in this endeavor. That’s a bit surprising, given the typical weaknesses of syslog. But they’ve probably identified the broad potential this protocol has. Maybe I should look a bit more towards Asia with rsyslog and phpLogCon as well as the Windows product line.

security…

No system is totally secure. Few systems are totally insecure. Most systems are between these two extremes. But what does “more secure” mean? We had an interesting discussion on the rsyslog mailing list on the use of root jails. I’d like to reproduce one of my posts here, not only because it is mine, but because it can guard us a bit towards the security goals for rsyslog.

Let me think of security as a probability of security breach. S_curr is the security of the reference system without a root jail. S_total is the security of a hypothetical system that is “totally secure” (knowing well that no such system exists). In other words, the probability S_total equals 0.

I think the common ground is that a root jail does not worsen security. Note that I do not say it improves security, only that it does not reduce a system’s security. S_jail is the security of a system that is otherwise identical to the reference system, but with a root jail. Than S_jail <= s_curr, because we assume that the security of the system is not reduced.

I think it is also common ground that the probability of a security breach is reduced if the number of attack vectors is reduced, without any new attack vectors being added. [There is one generic “attack vector”, the “thought of being secure and thus becoming careless” which always increases as risk is reduced – I will not include that vector in my thoughts]

We seem to be in agreement that a root jail is able to prevent some attacks from being successful. I can’t enumerate them and it is probably useless to try to do so (because attackers invent new attacks each day), but there exist some attacks which can be prevented by a root jail. I do not try to weigh them by their importance.

For obvious reasons, there exist other attacks which are not affected by the root jail. Some of them have been mentions, like the class of in-memory based attacks, code injection and many more.

I tend to think that the set of attack vectors that can be prevented by a root jail is much smaller than the set of those which can not. I also tend to think that the later class contains the more serious attack vectors.

But even then, a root jail seems to remove a subset of the attack vectors that otherwise exist and so it reduces the probably of security breach. So it benefits security. We can only argue that it does not benefit security if we can show that in all cases we can think of (and those we can not), security is not improved. However, some cases have been show, where it improves, so it can not be that security is not improved in all cases. As such, a root jail improves security, or more precisely the probability of a security breach is

0 < S_jail < S_curr

We can identify the benefit we gain is the difference between the reference system’s probability of security breach and the system with the jail. Be S_impr this improvement, than

S_impr = S_curr – S_jail

Now the root jail is just one potential security measure. We could now try to calculate S_impr for all kinds of security measures, for example a privilege drop. I find it hard to do the actual probability calculations, but I would guess that S_impr_privdop > S_impr_jail.

Based on the improvements, one may finally decide what to implement first (either at the code or admin level), all of this of course weighted with the importance of the numbers.

In any case, I think I have shown that both is correct:

  • the root jail is a security improvement
  • there exist numerous other improvements, many of them probably more efficient than the jail