How we found and fixed a CVE in librelp

This is a joint blog post, from Adiscon and Semmle, about the finding and fixing of CVE-2018-1000140, a security vulnerability in librelp. This was a collaborative effort by:

  • Kevin Backhouse, Semmle, Security Researcher.
  • Rainer Gerhards, Adiscon, Founder and President.
  • Bas van Schaik, Semmle, Head of Product.

We have published this post on Rainer’s blog here and the LGTM blog.

Bas originally found the vulnerability (using lgtm.com) and Rainer fixed it. Kev developed the proof-of-concept exploit.

In this blog post, we explain the cause of the bug, which is related to a subtle gotcha in the behavior of snprintf, and how it was found by a default query on https://lgtm.com/. We also demonstrate a working exploit (in a docker container, so that you can safely download it and try it for yourself). As a bonus, we give a short tutorial on how to set up rsyslog with TLS for secure communication between the client and server.

Severity and mitigation

The vulnerability (CVE-2018-1000140) is in librelp versions 1.1.1 up to 1.2.14.

Librelp is a component used heavily inside rsyslog. It is also used in some other projects, but we use the term “rsyslog” as a proxy for any affected projects, as it is the prime user of librelp (to the best of our knowledge). However, if you use librelp in any project, we strongly recommend that you upgrade to version 1.2.15 or newer.

The vulnerability only affects rsyslog installations that use TLS for secure RELP communication between the client and the server. In its default configuration, rsyslog does not do any client-server communication at all, so the vulnerability only affects more advanced installations. Furthermore, to trigger the vulnerability, an attacker needs to create a malicious certificate that has been signed by a certificate authority that is trusted by the rsyslog

The ‘snprintf’ gotcha

The vulnerability is in the following block of code (lines 1195-1211 of `tcp.c`):


/* first search through the dNSName subject alt names */
iAltName = 0;
while(!bFoundPositiveMatch) { /* loop broken below */
  szAltNameLen = sizeof(szAltName);
  gnuRet = gnutls_x509_crt_get_subject_alt_name(cert, iAltName,
                                                szAltName, &szAltNameLen, NULL);
  if(gnuRet < 0) break; else if(gnuRet == GNUTLS_SAN_DNSNAME) { pThis->pEngine->dbgprint("librelp: subject alt dnsName: '%s'\n", szAltName);
    iAllNames += snprintf(allNames+iAllNames, sizeof(allNames)-iAllNames,
                          "DNSname: %s; ", szAltName);
    relpTcpChkOnePeerName(pThis, szAltName, &bFoundPositiveMatch);
    /* do NOT break, because there may be multiple dNSName's! */
  }
  ++iAltName;
}

It is caused by a subtle gotcha in the behavior of `snprintf`. You have to read this excerpt of the man page for `snprintf` quite carefully to spot the problem:

The functions `snprintf()` and `vsnprintf()` do not write more than *size* bytes (including the terminating null byte (‘\0’)). If the output was truncated due to this limit then the return value is the number of characters (excluding the terminating null byte) which would have been written to the final string if enough space had been available. Thus, a return value of *size* or more means that the output was truncated.

The Red Hat Security Blog has a great post about this, written by Florian Weimer in March 2014. The crucial point is that `snprintf` returns the length of the string that it *would have* written if the destination buffer was big enough, not the length of the string that it actually wrote. So the correct way to use `snprintf` is something like this:


int n = snprintf(buf, sizeof(buf), "%s", str);
if (n < 0 or n >= sizeof(buf)) {
printf("Buffer too small");
return -1;
}

Alternatively, you could use it like this to allocate a buffer that is exactly the right size:


int n = snprintf(0, 0, "%s", str);
n++; // Add space for null terminator.
if (n <= 0) {
printf("snprintf error\n");
return -1;
}
buf = malloc(n);
if (!buf) {
printf("out of memory\n");
return -1;
}
snprintf(buf, n, "%s", str);

This second use-case is probably the reason why `snprintf` is designed the way it is. However, given that the type of its size parameter is `size_t`, it still seems like a bizarre design choice that its return type is `int`, rather than `size_t`.

Interestingly, some code bases use their own implementations of `snprintf`, which don’t necessarily behave in the same way. For example, there’s one in curl and another in sqlite. Other code bases use wrapper functions to work around inconsistencies in the implementation of `snprintf` on different platforms, like this one in flac.

How the ‘snprintf’ gotcha can cause a buffer overflow

The cruel irony of the `snprintf` gotcha is that it can cause the exact buffer overflow that the programmer was diligently trying to avoid by using `snprintf`, rather than `sprintf`. This is exactly what happened on line 1205 of `tcp.c`:


iAllNames += snprintf(allNames+iAllNames, sizeof(allNames)-iAllNames,
"DNSname: %s; ", szAltName);

Suppose `szAltName` is a sufficiently long string to cause a buffer overflow. Then `snprintf` does not write beyond end of the buffer. However, due to the surprising behavior described above, it still returns the length of the string that it *would have* written if the buffer was big enough. This means that after the `+=` operation, `iAllNames > sizeof(allNames)`. On the next iteration of the loop, `sizeof(allNames)-iAllNames` overflows negatively and `snprintf` is called with an extremely large size parameter. This means that subsequent iterations of the loop write beyond the end of the buffer.

As a side note, a dynamic test utilizing Undefined Behavior Sanitizer (UBSAN) would have detected that problem. Unfortunately there is no such test case inside the rsyslog test suite. That’s the usual problem with dynamic testing; it always depends on the test cases. A strength of static analysis is that such situations can be detected even if they are overlooked by developers and QA engineers. A more elaborate discussion of this topic can be found in Rainer’s blog post on the benefits of static analysis.

An interesting feature of this bug is that there is a gap in the buffer overflow. Suppose that there are 10 bytes left in the buffer and that the length of `szAltName` is 100 bytes. Then only the first 10 bytes of the string are written to the buffer. On the next iteration of the loop, the next string is written to a starting offset 90 bytes after the end of the buffer. An attacker can utilize this gap to only overwrite a very specific area of the stack. It also means that they can avoid overwriting the stack canary.

Finding and fixing the bug

The bug was found by Bas, after a conversation with Rainer on twitter. Rainer had just heard about lgtm.com and asked Bas if he could upload rsyslog. At the time, LGTM’s support for C/C++ projects was still in beta, so Bas needed to upload it manually. Bas uploaded it to a non-public LGTM instance first to make sure that the results looked reasonable and noticed this result. It was found by our Potentially overflowing call to snprintf query, which is designed to find this exact pattern. Bas consulted with Kev, who agreed that it looked like a genuine vulnerability. Kev then immediately contacted Rainer, so that he could fix the bug before it went public on lgtm.com.

Rainer fixed the bug one day after Kev reported it to him and released librelp version 1.2.15, which includes the fix, two days after that.

Using rsyslog with secure TLS communication

In the next section, we demonstrate a proof-of-concept exploit. But first, we need to explain how to set up rsyslog with TLS secured communication between RELP client and server, because the vulnerability is in the code that checks the certificate of the client. The instructions that we explain here are quite similar to this tutorial, except that we use openssl, rather than GnuTLS, to generate the certificates.

In its default configuration, rsyslog is just used for logging on a single machine. In a more advanced configuration, you might want to have multiple client machines sending log messages to a central server. If so, you can use TLS to securely send the log messages to the server. To enable TLS communication, every machine in the cluster needs a public/private key pair. Additionally, all of these keys need to be signed by a Certificate Authority (CA), which has its own public/private key pair. Every machine in the cluster needs to know the public key of the CA, so that they can verify each other’s keys: when two members of the cluster connect to each other, they send each other certificates signed by the CA to prove to each other that they are legitimate members of the cluster. In this tutorial, we generate public/private key pairs for the CA, client, and server. Then we start a server rsyslog and a client rsyslog in two separate docker containers and connect them to each other.

An important word of warning about this tutorial:

To make the tutorial easy to run, we have set it up so that all the certificates are generated by the `docker build` step. This means that all the private keys are stored in the same docker image. *Don’t do this in a real installation!* You should generate all the certificates separately so that each machine only knows its own private key.

To make the tutorial as simple as possible to try out, we have uploaded the necessary scripts to GitHub. You can download the scripts and build the docker image as follows:


git clone https://github.com/Semmle/SecurityExploits.git
cd SecurityExploits/rsyslog/CVE-2018-1000140_snprintf_librelp
docker build . -t rsyslog-demo

Besides installing all the necessary software packages, the Dockerfile also runs the following setup steps:

  1. Download the source code for rsyslog, librelp and related components and check out the version with the vulnerability.
  2. Build and install rsyslog from source, by calling this build script.
  3. Call this script to generate certificates for the client, server, and CA.
  4. Call this script to generate a malicious certificate. We demonstrate this in the next section.

Generating certificates with openssl is normally an interactive process, in which it asks you questions about your name, email address, organization, and so on. To make the process fully automatic, we have instead used configuration files. For example, `ca.config` contains the configuration for the certificate authority and `server.config` contains the configuration for the server. Hopefully it is reasonably clear which lines of the configuration files you need to change if you are setting up your own rsyslog installation.

Using docker, we can simulate the scenario where there are two machines, a client and a server, both running rsyslog. First we need to create a virtual network, so that the client can connect to the server:


docker network create -d bridge --subnet 172.25.0.0/16 rsyslog-demo-network

Now, in two separate terminals, we start two docker containers: one for the server and one for the client. In terminal 1, we start a container for the server like this:


docker run --network=rsyslog-demo-network --ip=172.25.0.10 -h rsyslog-server -i -t rsyslog-demo

In terminal 2, we start a container for the client like this:


docker run --network=rsyslog-demo-network --ip=172.25.0.20 -h rsyslog-client -i -t rsyslog-demo

The docker image contains rsyslog configuration files for the client and server. These configuration files reference the TLS certificates which were generated during the building of the docker image. We can now start the client and server. In terminal 1, we start the rsyslog server:


sudo rsyslogd -f benevolent/rsyslog-server.conf

(Note: the docker image is configured so that the `sudo` password is “x”.)

In terminal 2, we start the rsyslog client:


sudo rsyslogd -f benevolent/rsyslog-client.conf

There is now a secure TCP connection between the client and server.

Proof-of-concept exploit

The purpose of the vulnerable block of code is to search through the subject alternative names in the certificate, looking for a name that it recognizes. For example, we have specified in the configuration file for the rsyslog server that it should only allow a client with the name `client.wholesomecomputing.com`. Subject alternative names are usually used to enable one certificate to cover multiple subdomains. For example, the certificate for `example.com` might also include `www.example.com` and `help.example.com` as subject alternative names. In the context of rsyslog, you might use the subject alternative names to generate a single certificate that covers all the machines in the cluster (although this would be less secure than generating a separate certificate for each machine).

To trigger the vulnerability, we need a certificate with a large number of very long subject alternative names. The size of the stack buffer, `allNames`, is 32768 bytes. Alternative names that exceed 1024 bytes are rejected by `gnutls_x509_crt_get_subject_alt_name`, so the config file for our malicious certificate contains 33 alternative names, most of which are just over 1000 bytes long. The 32nd name is slightly shorter to adjust the gap, so that the 33rd name overwrites the return address on the stack.

To run the exploit PoC, start the two docker containers as before. In the client container, start rsyslog with the malicious configuration:


sudo rsyslogd -f malicious/rsyslog-client.conf

The rsyslog server runs in the background, so you need to use `ps` to see that it has crashed. Alternatively, you can attach gdb to it before you start the malicious client, to see the effect of the exploit in slow motion. If you want to do this, then you need to add a few extra command line flags when you start the docker container for the server:

docker run --network=rsyslog-demo-network --ip=172.25.0.10 -h rsyslog-server --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -i -t rsyslog-demo

These extra arguments disable the default security settings in docker which prevent gdb from attaching to another process. Without them, you are likely to encounter an error message like this. This solution was suggested here.

Timeline

Why we use Static Code Analysis

We use static code analysis for two reasons. Both of them should probably be well-know, but discussions show that that’s not always the case. So I thought writing a small blog post makes sense.

The first reason is obvious: static analyzers help us catch code problems in early stages, and they do so without any special effort needed by test engineers. The analyzer “thinks” about many cases a human being does not think about and so can catch errors that are sometimes embarrassingly obvious – albeit you would have still overlooked them. Detecting these things early saves a lot of time. So we try to run the analyzers early and often (they are also part of our CI for that reason).

The benefits come at an expense, and this expense is named “false positive“. They happen and I always get asked if I can’t make an exception to cover such a thing. Unfortunately, I cannot. If I would allow one static analyzer fail into the QA system, all further builds would fail, triggering static analysis unusable. So, sorry, if you run into a false positive, you need to find a way to work around it. In my experience the “const” keyword in C is a little gem that not only helps secure against accidental variable modification but also gets you going a long way in regard to static analyzers. But, granted, sometimes it’s hard to work around false positives. It’s worth it, so just do it ;-)

The second reason for using static analysis also seems obvious, but in my experience is often overlooked: humans tend to forget some important test cases. It is well-known and accepted that test should be crafted by QA engineers instead of the folks that wrote the code (because if the developer would otherwise only test what he had thought about in the first place). For smaller projects, that’s not always possible, but even more important QA folks also can overlook necessary test cases. The risk is reduced by using specific test crafting methodology, but it still exists. This is especially true as due to combinatorical explosion not all configuration setting interactions can be tested. So picking dynamic tests is always a compromise between what is a) seen at all, b) desirable, and c) possible. Static analysis helps with these problems. While it obviously can also fail, I have seen static analysis more than once detect things that we did not cover in dynamic tests. That way it introduces an additional layer of protection. It also sometimes brought up the need for additional dynamic tests.

It should be mentioned that fuzzing is also a great thing to have inside a QA system, but I unfortunately did not yet have the opportunity to deploy it on some real project. But even fuzzing when done by Google is limited by the same combinatorical explosion problem in regard to configuration settings. For example, rsyslog has many more than 250 config settings, so we have more than 2^250 = 18092513943330655534932966407607000000000000000000000000000000000000000000000 configurations we would need to fuzz – simply impossible [yes, an approach would be to fuzz the tuple (config,data), but that’s a different topic ;-)].

Static analysis is not the answer to the software QA problem. But it is an extremely valuable building block!

rsyslog’s first signature provider: why Guardtime?

The new has already spread: rsyslog 7.3 is the first version that natively supports log signatures, and does so via a newly introduced open signature provider interface. A lot of my followers quickly realized that and begun to play with it. To make sense of the provider interface, one obviously also needs a signature provider. I selected the keyless signature infrastructure (KSI), which is being engineered by the OpenKSI group. Quickly, I was asked what were the compelling reasons to provide the first signature provider for this technology.

So rather than writing many mails, I thought I blog about the reason ;)

We need to dig a little back in history to understand the decision. I looked at log signature a long time ago, I think my interest started around 2001 or 2002, as part of the IETF syslog standardization efforts. I have to admit that at this time there was very limited public interest in signed logs, even though without signatures it is somewhat hard to prove the correctness of log records. Still, many folks successfully argued that they could proof their process was secure, and that proof seemed to be sufficient, at least at that time.

A core problem with log record signatures is that a single log record is relatively small, typically between 60 bytes and 2k, with typical Linux logs being between 60 and 120 bytes. This makes it almost impossible to sign a single log record, as the signature itself is much larger. Also, while a per-record signature can proof the validity of the log record, it cannot proof that the log file as such is complete. This second problem can be solved by a method that is called “log chaining“, where each of the log records is hashed and the previous hash is also used as input for hashing the current record. That way, removing or inserting a log record will break the hash chain, and so tampering can be detected (of course, tampering within a log record can also be easily detected, as it obviously will invalidate the chain as well).

This method is actually not brand new technology but ages old. While it sounds perfect, there are some subtle issues when it comes to logging. First and foremost, we have a problem when the machine that stores the log data itself is being compromised. For the method to work, that machine must both create the hashes and sign them. To do so, it must have knowledge of the secrets being used in that process. Now remember from your crypto education that any secrets-based crypto system (even PKI) is only secure as long as the secrets have not been compromised. Unfortunately, if the loghost itself has been compromised, that precondition does not hold any longer. If someone got root-level access, he or she also has access to the secrets (otherwise the signature process could also not access them, right?).

You may now try as hard as you like, but if everything is kept on the local machine, a successful attacker can always tamper the logs and re-hash and re-sign them. You can only win some advantage if you ship part of the integrity proof off the local system – as long as you assume that not all of the final destinations are compromised (usually a fair assumption, but sometimes questionable if everything is within the same administrative domain).

The traditional approach in logging is that log records are being shipped off the machine. In IETF efforts we concentrated on that process and on the ability to sign records right at the originator. This ultimately resulted in RFC 5848, “signed syslog messages”, which provides the necessary plumbing to do that. A bit later, at Mitre’s CEE effort, we faced the same problem and discussed a similar solution. Unfortunately, there is a problem with that approach: in the real-world, in larger enterprises, we usually do not have a single log stream, where each record is forwarded to some final destinations. Almost always, interim relays filter messages, discard some (e.g. noise events) and transform others. The end result is that a verification of the log stream residing at the central loghost will always fail. Note that this is not caused by failure in the crypto method used – it’s a result of operational needs and typical deployments. Those interested in the fine details may have a look at the log hash chaining paper I wrote as an aid during CEE discussions. In that paper, I proposed as an alternative method that just the central log host signs the records. Of course, this still had the problem of central host compromise.

Let me sum up the concerns on log signatures:

  • signing single log records is practically impossible
  • relay chains make it exceptionally hard to sign at the log originator and verify at the central log store destination
  • any signature mechanism based on locally-stored secrets can be broken by a sufficiently well-done attack.

These were essentially the points that made me stay away from doing log signatures at all. As I had explained multiple times in the past, that would just create a false sense of security.

The state of all the changed a bit after the systemd journal was pushed into existence with the promise that it would provide strong log integrity features and even prevent attacks. There was a lot of vaporware associated with that announcement, but it was very interesting to see how many people really got excited about it. While I clearly described at that time how easy the system was to break, people begun to praise it so much that I quickly developed LogTools, which provided exactly the same thing. The core problem was that both of them were just basic log hash chains, and offered a serious amount of false sense of security (but people seemed to feel pampered by that…).

My initial plan was to tightly integrate the LogTools technology into rsyslog, but my seriousness and my concerns about such a bad security took over and I (thankfully) hesitated to actually do that.

At this point of the writing credits are due to the systemd journal folks: they have upgraded their system to a much more secure method, which they call forward secure sealing. I haven’t analyzed it in depth yet, but it sounds like it provides good features. It’s a very new method, though, and security folks for a good reason tend to stick to proven methods if there is no very strong argument to avoid them (after all, crypto is extremely hard, and new methods require a lot of peer review, as do new implementations).

That was the state of affairs at the end of last year. Thankfully I was approached by an engineer from Guardtime, a company that I learned is deeply involved in the OpenKSI approach.  He told me about the technology and asked if I would be interested in providing signatures directly in rsyslog. They seemed to have some experience with signing log files and had some instructions and tools available on their web site as well as obviously some folks who actually used that.

I have to admit that I wasn’t too excited at that time and very busy with other projects. Anyhow, after the Fedora Developer Conference in February 2013 I took the time to have a somewhat deeper look at the technology – and it looked very interesting. It uses log chains, but in a smart way. Instead of using simple chains, it uses so-called Merkle trees, a data structure that was originally designed for the purpose of signing many documents very efficiently. They were invented back in the 1970’s and are obviously a quite proven technology. An interesting fact about the way Merkle trees are used in the OpenKSI approach is that they permit to extract a subset of log information and still prove that this information is valid. This is a very interesting property when you need to present logs as evidence to the court but do not want to disclose unrelated log entries.

While all of this is interesting, the key feature that attract my interest is the “keyless” inside the KSI name. If there is no key, an attacker can obviously not compromise it. But how will signing without a key work? I admit that at first I had a hard time understanding that, but the folks at Guardtime were very helpful in explaining how it works (and had a lot of patience with me ;)). Let me try to explain in a nutshell (and with a lot of inaccuracies):

The base idea goes back to the Merkle tree, but we can go even more basic and think of a simple hash chain. Remember that each hash is depending on its predecessor and the actual data to be hashed. If you now create a kind of global hash chain where you submit everything you ever need to “sign”, this can form a chain in itself. Now say you have a document (or log) x that you want to sign. You submit x to the chaining process and receive back a hash value h. Now you store that h, and use it as a signature, a proof of integrity of x. Now assume that there is a way that you give out x and h and someone can verify that x participated in the computation of the “global” log chain. If so, and if x’s hash stil matches the value that was used to generate h, than x is obviously untampered. If we further assume that there is a fixed schedule on which some “anchor hashes” are being produced, and assume that we can track to which such anchor hash h belongs to, we can even deduce at what time x was signed. In essence, this is what Guardtime does. They operate the server infrastructure that does this hashing and timestmaping. The key hashes are generated once a second, so each signature can be tracked very precisely to the time it was generated. This is called “linked timestamping” and for a much better description than I have given just follow that link ;)

The key property from my PoV is that with this method, no secrets are required to sign log records. And if there is no secret, an attacker can obviously not take advantage of the secret to hide his tracks. So this method actually provides a very good integrity proof and does not create a false sense of security. This removed a major obstacle that always made me not like to implement log signatures.

The alert reader may now ask: “Doesn’t that sound too good?  Isn’t there a loophole?“. Of course, and as always in security, one can try to circumvent the mechanism. A very obvious attack is that an attacker may still go ahead and modify the log chain, re-hash it, and re-submit it to the signature process. My implementation inside rsyslog already makes this “a bit” hard to do because we keep log chains across multiple log files and the tooling notifies users if a chain is re-started for some reason (there are some valid reasons). But the main answer against these types of attacks is that when a re-hashing happens, the new signature will bear a different timestamp. As such, if comparing the signature’s timestamp and the log record timestamps, tampering can be indicated. In any case, I am sure we will see good questions and suggestions on how to improve my code. What makes me feel about the method itself is that it

  • bases on well-known and proven technology (with the Merkle tree being one foundation)
  • is in practical use for quite some while
  • ample of academic papers exist on it
  • there is a vital community driving it forward and enhancing the technology

So everything looks very solid and this is what triggered my decision to finally implement log signature directly into the rsyslog core. And to do it right, I did not create a KSI-specific interface but rather modeled a generic signature provider interface inside rsyslog. So if for whatever reason you don’t like the KSI approach, there is a facility inside rsyslog that can be used to easily implement other signature providers. Due to the reasoning given above, however, I have “just” implemented the KSI one for now.

As a side-note, please let me say that the Guardtime folks were very easy to work with and very cooperative, especially when I asked my initial though and very skeptic questions. They also are very open (lot’s of their tooling is open source) and … smart ;) During the project, our work relationship grew very much and the even managed to get me in direct contact with Ahto Buldas, who is a key inventor behind that technology (and a very impressive guy!). I am looking forward to continue to work with them in the future. In April 2013 they invited me to their Estonian office, where we had three days to discuss things like the network-signature problem (with relays discarding logs) and related problems like multi-tenancy and centralized virtual machine logging. These were very good discussions and some cool technology is probably coming out of them. I am tempted to write a bit more about these topics (as far as they have been tackled so far), but this posting is already rather long, so let’s save this for another one ;)

Which data does the Guardtime signature provider transfer to external parties?

With the interest in privacy concerns currently having a “PRISM-induced high”, I wanted to elaborate a little bit about what rsyslog’s Guardtime signature provider actually transmits to the signature authority.

This is a condensed post of what the provider does, highlighting the main points. If you are really concerned, remember that everything is open source. So you are invited to read the actual signature provider source, all of which is available at the rsyslog git.

The most interesting question first: the provider does only send a top-level hash to the signature authority. No actual log record will ever be sent or otherwise disclosed.

The way this works is that the provider creates a “smart” log signature chain. Actually, it is not a simple chain but rather a Merkle Tree (that’s why I call it “smart”).

When a log file is opened, the provider checks the signature status file if the file already contains some signed records and, if so, picks up the last hash. Then, hashing is initialized. For each record being written, both the previous hash and the current record are concatenated, hashed and the Merkle tree is updated. The actual signature is written when it is “time to do so”. And “time to do so” is when the file either needs to be closed (e.g. rsyslogd shutdown, eviction from dynafile cache) or the configured max block size is reached. In that case, the “last” hash (actually the root of the local Merkle tree) is sent to the signature authority for signing. This is the only data item ever sent out remotely. The signature authority replies with a signature for that hash, and the signature is stored.

Now let’s refresh you crypto know-how with the fact that a cryptographic hash function is necessarily a one-way function. That means there is no way to deduce from the hash what the original data is (except, of course, for brute-force attacks, what is a well-known fact and easily circumvented by using sufficiently large hash sizes). So even with the “last” hash being known by the authority, there is no way to know or even guess what the actual log records look like.

The bottom line is that all processing of the actual log records is done on the local rsyslog machine. You’ll even notice that by a slight (up to 30% depending on settings) increase in CPU use of the local machine. This is the actual hashing taking place.

Do you know wonder why log signatures can still be verified and trusted even if under attack. The answer is relatively straightforward, but requires some “cryptographer thinking”: It is impossible to alter the “last” hash, and that last hash has been signed (and timestamped!). So if you try to re-hash, that last hash would change. If you now verify, hashes do not match. You may think of this systems as multiple distributed Merkle trees in action, where the lowest-level Merkle tree is generated by the local rsyslog, and its top-level hash being provided to the external upper part of the tree (where it actually goes into the bottom-level computation). But I think now I involved myself (and you) too much into details…

Some more details on how the signing happens can also be found in my LinuxTag 2013 presentation (see slide 24 and following) and the accompanying paper.

simplifying rsyslog JSON generation

With RESTful APIs, like for example ElasticSearch, you need to generate JSON strings. Rsyslog will soon do this in a very easy to use way. The current method is not hard either, but often looks a bit clumsy. The new way of doing things will most probably be part of the 8.33 release.

You now can define a template as follows:

template(name="outfmt" type="list" option.jsonf="on") {
property(outname="@timestamp"
name="timereported"
dateFormat="rfc3339" format="jsonf")
property(outname="host"
name="hostname" format="jsonf")
property(outname="severity"
name="syslogseverity-text" caseConversion="upper" format="jsonf")
property(outname="facility"
name="syslogfacility-text" format="jsonf")
property(outname="syslog-tag"
name="syslogtag" format="jsonf")
property(outname="source"
name="app-name" format="jsonf")
property(outname="message"
name="msg" format="jsonf")

}

This will generate JSON. Here is a pretty-printed version of the generated output:

{
"@timestamp": "2018-03-01T01:00:00+00:00",
"host": "172.20.245.8",
"severity": "DEBUG",
"facility": "local4",
"syslog-tag": "app[1666]",
"source": "app",
"message": " this is my syslog message"
}

Note: the actual output will be compact on a single “line”, as this is most useful with RESTful APIs.

Future versions of rsyslog may see additional simplifications in generating the JSON. For example, I currently think about removing the need to give format=”jsonf” for each property.
The functionality described here is being added via this pull request.

New Web Site Online

Really no big news. But after roughly 10 years I managed to revamp my personal web site. This time, it’s destined to be slim and stable. The majority of content is on other sites, e. g. my syslog blog or github.

I still find it is useful to have kind of a personal home in virtual space. So here it is, and it is severely renovated. Let’s see when it gets the next brush-up…

experimental debian rsyslog packages

We often receive requests for Debian packages. So far, we did not package for recent Debian, as the Debian maintainer, Michael Biebl, does an excellent job. Other than us, he is a real expert on Debian policies and infrastructure.

Nevertheless, we now took his package sources and gave the Suse Open Build Service a try. In the end result, we now seem to have usable Debian packages (and more) available at:
I would be very interested in your feedback on the first incarnation of this project. Is it useful? Is it something we should continue? Do you have any problems with the packages? Other suggestions? Please let us know.
Please node: should we decide that the project is worth keeping, the above URL will change. However, it we will give sufficiently advance notice. The current version is not suggested for production systems, at least not without trying it out on test-systems first!

docker group security risk

The Docker doc spells out that there are security concerns of adding a user to the docker group. Unfortunately, they do not precisely give what the concern is. I guess that is a “security-by-obscurity” approach trying to avoid bad things. Practice show this isn’t useful: the bad guys know anyways, and the casual user has a bad time understanding the actual risk involved.

It is considerable, so let me explain at least one risk (I have not tried exhaustively check security issues):  The containers are usually defined to run as root user. This permits you to bypass permission checks on the host.

Let’s assume a $USER is inside the docker group and otherwise has just installed docker. So he can run

$ docker run -v/etc:/malicious -ti –rm alpine
# cd /malicious
# vi sudoers
…. edit, write …
# ctl-D

As such, the user can modify system config that he could not access otherwise. It’s a real risk. If you have a one-person “personal” machine/VM where the user has sudo permissions in any case … I’d say it’s no real issue.

The story is a different one on e.g. a CI machine.  It’s easy to inject bad code into public pull requests, and so it’ll run on the CI platform. Usually (before spectre/meltdown…), this was guarded by the (low) permissions of the CI worker user (if you run CI with a sudo-enabled user … nothing changed). When you enable it to use docker, you now get this new class of attack vector. Don’t get me wrong: I do NOT advocate against using docker in CI. Right the opposite, it’s an excellent tool there. I just want to make you aware that you need to consider and mitigate another attack vector.

rsyslog 8.31 – an important release

Today, we release rsyslog 8.31. This is probably one of the biggest releases in the past couple of years. While it also offers great new functionality, what really important about it is the focus on further improved software quality.

Let’s get a bit down on it. First let’s mention some important new features:

And then we have a set of several hundred commits concerned with improved software quality:
  • testbench dynamic tests have been extended
  • coverage of different compilers and compiler options has been enhanced
  • more modules are automatically scanned by static analysis
  • daily Coverity scans were added to the QA system, which have proofen to be a very useful addition
  • more aggressive and automated testing with threading debuggers (valgrind’s helgrind and clang thread sanitizer) has been added, also with great success
  • as a result of these actions, we could find and fix many small software defects.
  • and there also have been some big and important fixes, namely for imjournal, omelasticsearch, mmdblookup and the rsyslog core
Many users were involved in finding and fixing bugs, many of which all I do know from is there github handle. So rather than highlighting just some of them, I would like to refer to the github milestone issue tracker where all can be found. My sincere thanks to everyone for all their support!
 
I consider rsyslog release 8.31 a major step forward in our QA policy. We already had improved quite a bit and the state was already pretty good. However, with the changes introduced in 8.31, we make a big step forward, into a kind of next-gen QA policy. Of course, QA is a journey and not a “do once” target. So expect more good stuff upcoming in the next releases. We are not done yet!
I would like to use this opportunity to express a personal “thank yo” to Thomas Deutschmann, also know as Whissi for all his good advice and sometimes strong words that drive use towards better quality. While many folks have helped us with this, Whissi is consistenytly insisting on good QA policies for a long time now, and he always was persistent in fighting with me when I did not understand the value or did not want to. Whissi, I admit I sometimes wasn’t too fond of you, but believe me, at the end of the day I *really* value what you are doing. You have my deepest respect. Looking forward to many more years of discussions!

The clang thread sanitizer

Finding threading bugs is hard. Clang thread sanitizer makes it easier. The thread sanitizer instruments the to-be-tested code and emits useful information about actions that look suspicious (most importantly data races). This is a great aid in development and for QA. Thread sanitizer is faster than valgrind’s helgind, which makes it applicable to more use cases. Note however that helgrind and thread sanitizer sometimes each detect issues that the other one does not.
This is how thread sanitizer can be activated:
  • install clang package (the OS package is usually good enough, but if you want to use clang 5.0, you can obtain packages from http://apt.llvm.org/)
  • export CC=clang // or CC=clang-5.0 for the LLVM packages
  • export CFLAGS=”-g -fsanitize=thread -fno-omit-frame-pointer”
  • re-run configure (very important, else CFLAGS is not picked up!)
  • make clean (important, else make does not detect that it needs to build some files due to change of CFLAGS)
  • make
  • install as usual
If you came to this page trying to debug a rsyslog problem, we strongly suggest to run your instrumented version interactively. To do so:
  • stop the rsyslog system service
  • sudo -i (you usually need root privileges for a typical rsyslogd configuration)
  • execute /path/to/rsyslogd -n …other options…
    here “/path/to” may not be required and often is just “/sbin” (so “/sbin/rsyslogd”)
    “other options” is whatever is specified in your OS startup scripts, most often nothing
  • let rsyslog run; thread sanitizer will spit out messages to stdout/stderr (or nothing if all is well)
  • press ctl-c to terminate rsyslog run

Note that the thread sanitizer will display some false positives at the start (related to pthread_cancel, maybe localtime_r). The stack trace shall contain exact location info. If it does not, the ASAN_SYMBOLIZER is not correctly set, but usually it “just works”.
Doc on thread sanitizer ist available here: https://clang.llvm.org/docs/ThreadSanitizer.html