rsyslog’s first signature provider: why Guardtime?

The new has already spread: rsyslog 7.3 is the first version that natively supports log signatures, and does so via a newly introduced open signature provider interface. A lot of my followers quickly realized that and begun to play with it. To make sense of the provider interface, one obviously also needs a signature provider. I selected the keyless signature infrastructure (KSI), which is being engineered by the OpenKSI group. Quickly, I was asked what were the compelling reasons to provide the first signature provider for this technology.

So rather than writing many mails, I thought I blog about the reason ;)

We need to dig a little back in history to understand the decision. I looked at log signature a long time ago, I think my interest started around 2001 or 2002, as part of the IETF syslog standardization efforts. I have to admit that at this time there was very limited public interest in signed logs, even though without signatures it is somewhat hard to prove the correctness of log records. Still, many folks successfully argued that they could proof their process was secure, and that proof seemed to be sufficient, at least at that time.

A core problem with log record signatures is that a single log record is relatively small, typically between 60 bytes and 2k, with typical Linux logs being between 60 and 120 bytes. This makes it almost impossible to sign a single log record, as the signature itself is much larger. Also, while a per-record signature can proof the validity of the log record, it cannot proof that the log file as such is complete. This second problem can be solved by a method that is called “log chaining“, where each of the log records is hashed and the previous hash is also used as input for hashing the current record. That way, removing or inserting a log record will break the hash chain, and so tampering can be detected (of course, tampering within a log record can also be easily detected, as it obviously will invalidate the chain as well).

This method is actually not brand new technology but ages old. While it sounds perfect, there are some subtle issues when it comes to logging. First and foremost, we have a problem when the machine that stores the log data itself is being compromised. For the method to work, that machine must both create the hashes and sign them. To do so, it must have knowledge of the secrets being used in that process. Now remember from your crypto education that any secrets-based crypto system (even PKI) is only secure as long as the secrets have not been compromised. Unfortunately, if the loghost itself has been compromised, that precondition does not hold any longer. If someone got root-level access, he or she also has access to the secrets (otherwise the signature process could also not access them, right?).

You may now try as hard as you like, but if everything is kept on the local machine, a successful attacker can always tamper the logs and re-hash and re-sign them. You can only win some advantage if you ship part of the integrity proof off the local system – as long as you assume that not all of the final destinations are compromised (usually a fair assumption, but sometimes questionable if everything is within the same administrative domain).

The traditional approach in logging is that log records are being shipped off the machine. In IETF efforts we concentrated on that process and on the ability to sign records right at the originator. This ultimately resulted in RFC 5848, “signed syslog messages”, which provides the necessary plumbing to do that. A bit later, at Mitre’s CEE effort, we faced the same problem and discussed a similar solution. Unfortunately, there is a problem with that approach: in the real-world, in larger enterprises, we usually do not have a single log stream, where each record is forwarded to some final destinations. Almost always, interim relays filter messages, discard some (e.g. noise events) and transform others. The end result is that a verification of the log stream residing at the central loghost will always fail. Note that this is not caused by failure in the crypto method used – it’s a result of operational needs and typical deployments. Those interested in the fine details may have a look at the log hash chaining paper I wrote as an aid during CEE discussions. In that paper, I proposed as an alternative method that just the central log host signs the records. Of course, this still had the problem of central host compromise.

Let me sum up the concerns on log signatures:

  • signing single log records is practically impossible
  • relay chains make it exceptionally hard to sign at the log originator and verify at the central log store destination
  • any signature mechanism based on locally-stored secrets can be broken by a sufficiently well-done attack.

These were essentially the points that made me stay away from doing log signatures at all. As I had explained multiple times in the past, that would just create a false sense of security.

The state of all the changed a bit after the systemd journal was pushed into existence with the promise that it would provide strong log integrity features and even prevent attacks. There was a lot of vaporware associated with that announcement, but it was very interesting to see how many people really got excited about it. While I clearly described at that time how easy the system was to break, people begun to praise it so much that I quickly developed LogTools, which provided exactly the same thing. The core problem was that both of them were just basic log hash chains, and offered a serious amount of false sense of security (but people seemed to feel pampered by that…).

My initial plan was to tightly integrate the LogTools technology into rsyslog, but my seriousness and my concerns about such a bad security took over and I (thankfully) hesitated to actually do that.

At this point of the writing credits are due to the systemd journal folks: they have upgraded their system to a much more secure method, which they call forward secure sealing. I haven’t analyzed it in depth yet, but it sounds like it provides good features. It’s a very new method, though, and security folks for a good reason tend to stick to proven methods if there is no very strong argument to avoid them (after all, crypto is extremely hard, and new methods require a lot of peer review, as do new implementations).

That was the state of affairs at the end of last year. Thankfully I was approached by an engineer from Guardtime, a company that I learned is deeply involved in the OpenKSI approach.  He told me about the technology and asked if I would be interested in providing signatures directly in rsyslog. They seemed to have some experience with signing log files and had some instructions and tools available on their web site as well as obviously some folks who actually used that.

I have to admit that I wasn’t too excited at that time and very busy with other projects. Anyhow, after the Fedora Developer Conference in February 2013 I took the time to have a somewhat deeper look at the technology – and it looked very interesting. It uses log chains, but in a smart way. Instead of using simple chains, it uses so-called Merkle trees, a data structure that was originally designed for the purpose of signing many documents very efficiently. They were invented back in the 1970’s and are obviously a quite proven technology. An interesting fact about the way Merkle trees are used in the OpenKSI approach is that they permit to extract a subset of log information and still prove that this information is valid. This is a very interesting property when you need to present logs as evidence to the court but do not want to disclose unrelated log entries.

While all of this is interesting, the key feature that attract my interest is the “keyless” inside the KSI name. If there is no key, an attacker can obviously not compromise it. But how will signing without a key work? I admit that at first I had a hard time understanding that, but the folks at Guardtime were very helpful in explaining how it works (and had a lot of patience with me ;)). Let me try to explain in a nutshell (and with a lot of inaccuracies):

The base idea goes back to the Merkle tree, but we can go even more basic and think of a simple hash chain. Remember that each hash is depending on its predecessor and the actual data to be hashed. If you now create a kind of global hash chain where you submit everything you ever need to “sign”, this can form a chain in itself. Now say you have a document (or log) x that you want to sign. You submit x to the chaining process and receive back a hash value h. Now you store that h, and use it as a signature, a proof of integrity of x. Now assume that there is a way that you give out x and h and someone can verify that x participated in the computation of the “global” log chain. If so, and if x’s hash stil matches the value that was used to generate h, than x is obviously untampered. If we further assume that there is a fixed schedule on which some “anchor hashes” are being produced, and assume that we can track to which such anchor hash h belongs to, we can even deduce at what time x was signed. In essence, this is what Guardtime does. They operate the server infrastructure that does this hashing and timestmaping. The key hashes are generated once a second, so each signature can be tracked very precisely to the time it was generated. This is called “linked timestamping” and for a much better description than I have given just follow that link ;)

The key property from my PoV is that with this method, no secrets are required to sign log records. And if there is no secret, an attacker can obviously not take advantage of the secret to hide his tracks. So this method actually provides a very good integrity proof and does not create a false sense of security. This removed a major obstacle that always made me not like to implement log signatures.

The alert reader may now ask: “Doesn’t that sound too good?  Isn’t there a loophole?“. Of course, and as always in security, one can try to circumvent the mechanism. A very obvious attack is that an attacker may still go ahead and modify the log chain, re-hash it, and re-submit it to the signature process. My implementation inside rsyslog already makes this “a bit” hard to do because we keep log chains across multiple log files and the tooling notifies users if a chain is re-started for some reason (there are some valid reasons). But the main answer against these types of attacks is that when a re-hashing happens, the new signature will bear a different timestamp. As such, if comparing the signature’s timestamp and the log record timestamps, tampering can be indicated. In any case, I am sure we will see good questions and suggestions on how to improve my code. What makes me feel about the method itself is that it

  • bases on well-known and proven technology (with the Merkle tree being one foundation)
  • is in practical use for quite some while
  • ample of academic papers exist on it
  • there is a vital community driving it forward and enhancing the technology

So everything looks very solid and this is what triggered my decision to finally implement log signature directly into the rsyslog core. And to do it right, I did not create a KSI-specific interface but rather modeled a generic signature provider interface inside rsyslog. So if for whatever reason you don’t like the KSI approach, there is a facility inside rsyslog that can be used to easily implement other signature providers. Due to the reasoning given above, however, I have “just” implemented the KSI one for now.

As a side-note, please let me say that the Guardtime folks were very easy to work with and very cooperative, especially when I asked my initial though and very skeptic questions. They also are very open (lot’s of their tooling is open source) and … smart ;) During the project, our work relationship grew very much and the even managed to get me in direct contact with Ahto Buldas, who is a key inventor behind that technology (and a very impressive guy!). I am looking forward to continue to work with them in the future. In April 2013 they invited me to their Estonian office, where we had three days to discuss things like the network-signature problem (with relays discarding logs) and related problems like multi-tenancy and centralized virtual machine logging. These were very good discussions and some cool technology is probably coming out of them. I am tempted to write a bit more about these topics (as far as they have been tackled so far), but this posting is already rather long, so let’s save this for another one ;)

Which data does the Guardtime signature provider transfer to external parties?

With the interest in privacy concerns currently having a “PRISM-induced high”, I wanted to elaborate a little bit about what rsyslog’s Guardtime signature provider actually transmits to the signature authority.

This is a condensed post of what the provider does, highlighting the main points. If you are really concerned, remember that everything is open source. So you are invited to read the actual signature provider source, all of which is available at the rsyslog git.

The most interesting question first: the provider does only send a top-level hash to the signature authority. No actual log record will ever be sent or otherwise disclosed.

The way this works is that the provider creates a “smart” log signature chain. Actually, it is not a simple chain but rather a Merkle Tree (that’s why I call it “smart”).

When a log file is opened, the provider checks the signature status file if the file already contains some signed records and, if so, picks up the last hash. Then, hashing is initialized. For each record being written, both the previous hash and the current record are concatenated, hashed and the Merkle tree is updated. The actual signature is written when it is “time to do so”. And “time to do so” is when the file either needs to be closed (e.g. rsyslogd shutdown, eviction from dynafile cache) or the configured max block size is reached. In that case, the “last” hash (actually the root of the local Merkle tree) is sent to the signature authority for signing. This is the only data item ever sent out remotely. The signature authority replies with a signature for that hash, and the signature is stored.

Now let’s refresh you crypto know-how with the fact that a cryptographic hash function is necessarily a one-way function. That means there is no way to deduce from the hash what the original data is (except, of course, for brute-force attacks, what is a well-known fact and easily circumvented by using sufficiently large hash sizes). So even with the “last” hash being known by the authority, there is no way to know or even guess what the actual log records look like.

The bottom line is that all processing of the actual log records is done on the local rsyslog machine. You’ll even notice that by a slight (up to 30% depending on settings) increase in CPU use of the local machine. This is the actual hashing taking place.

Do you know wonder why log signatures can still be verified and trusted even if under attack. The answer is relatively straightforward, but requires some “cryptographer thinking”: It is impossible to alter the “last” hash, and that last hash has been signed (and timestamped!). So if you try to re-hash, that last hash would change. If you now verify, hashes do not match. You may think of this systems as multiple distributed Merkle trees in action, where the lowest-level Merkle tree is generated by the local rsyslog, and its top-level hash being provided to the external upper part of the tree (where it actually goes into the bottom-level computation). But I think now I involved myself (and you) too much into details…

Some more details on how the signing happens can also be found in my LinuxTag 2013 presentation (see slide 24 and following) and the accompanying paper.

simplifying rsyslog JSON generation

With RESTful APIs, like for example ElasticSearch, you need to generate JSON strings. Rsyslog will soon do this in a very easy to use way. The current method is not hard either, but often looks a bit clumsy. The new way of doing things will most probably be part of the 8.33 release.

You now can define a template as follows:

template(name="outfmt" type="list" option.jsonf="on") {
property(outname="@timestamp"
name="timereported"
dateFormat="rfc3339" format="jsonf")
property(outname="host"
name="hostname" format="jsonf")
property(outname="severity"
name="syslogseverity-text" caseConversion="upper" format="jsonf")
property(outname="facility"
name="syslogfacility-text" format="jsonf")
property(outname="syslog-tag"
name="syslogtag" format="jsonf")
property(outname="source"
name="app-name" format="jsonf")
property(outname="message"
name="msg" format="jsonf")

}

This will generate JSON. Here is a pretty-printed version of the generated output:

{
"@timestamp": "2018-03-01T01:00:00+00:00",
"host": "172.20.245.8",
"severity": "DEBUG",
"facility": "local4",
"syslog-tag": "app[1666]",
"source": "app",
"message": " this is my syslog message"
}

Note: the actual output will be compact on a single “line”, as this is most useful with RESTful APIs.

Future versions of rsyslog may see additional simplifications in generating the JSON. For example, I currently think about removing the need to give format=”jsonf” for each property.
The functionality described here is being added via this pull request.

experimental debian rsyslog packages

We often receive requests for Debian packages. So far, we did not package for recent Debian, as the Debian maintainer, Michael Biebl, does an excellent job. Other than us, he is a real expert on Debian policies and infrastructure.

Nevertheless, we now took his package sources and gave the Suse Open Build Service a try. In the end result, we now seem to have usable Debian packages (and more) available at:
I would be very interested in your feedback on the first incarnation of this project. Is it useful? Is it something we should continue? Do you have any problems with the packages? Other suggestions? Please let us know.
Please node: should we decide that the project is worth keeping, the above URL will change. However, it we will give sufficiently advance notice. The current version is not suggested for production systems, at least not without trying it out on test-systems first!

rsyslog 8.31 – an important release

Today, we release rsyslog 8.31. This is probably one of the biggest releases in the past couple of years. While it also offers great new functionality, what really important about it is the focus on further improved software quality.

Let’s get a bit down on it. First let’s mention some important new features:

And then we have a set of several hundred commits concerned with improved software quality:
  • testbench dynamic tests have been extended
  • coverage of different compilers and compiler options has been enhanced
  • more modules are automatically scanned by static analysis
  • daily Coverity scans were added to the QA system, which have proofen to be a very useful addition
  • more aggressive and automated testing with threading debuggers (valgrind’s helgrind and clang thread sanitizer) has been added, also with great success
  • as a result of these actions, we could find and fix many small software defects.
  • and there also have been some big and important fixes, namely for imjournal, omelasticsearch, mmdblookup and the rsyslog core
Many users were involved in finding and fixing bugs, many of which all I do know from is there github handle. So rather than highlighting just some of them, I would like to refer to the github milestone issue tracker where all can be found. My sincere thanks to everyone for all their support!
 
I consider rsyslog release 8.31 a major step forward in our QA policy. We already had improved quite a bit and the state was already pretty good. However, with the changes introduced in 8.31, we make a big step forward, into a kind of next-gen QA policy. Of course, QA is a journey and not a “do once” target. So expect more good stuff upcoming in the next releases. We are not done yet!
I would like to use this opportunity to express a personal “thank yo” to Thomas Deutschmann, also know as Whissi for all his good advice and sometimes strong words that drive use towards better quality. While many folks have helped us with this, Whissi is consistenytly insisting on good QA policies for a long time now, and he always was persistent in fighting with me when I did not understand the value or did not want to. Whissi, I admit I sometimes wasn’t too fond of you, but believe me, at the end of the day I *really* value what you are doing. You have my deepest respect. Looking forward to many more years of discussions!

The clang thread sanitizer

Finding threading bugs is hard. Clang thread sanitizer makes it easier. The thread sanitizer instruments the to-be-tested code and emits useful information about actions that look suspicious (most importantly data races). This is a great aid in development and for QA. Thread sanitizer is faster than valgrind’s helgind, which makes it applicable to more use cases. Note however that helgrind and thread sanitizer sometimes each detect issues that the other one does not.
This is how thread sanitizer can be activated:
  • install clang package (the OS package is usually good enough, but if you want to use clang 5.0, you can obtain packages from http://apt.llvm.org/)
  • export CC=clang // or CC=clang-5.0 for the LLVM packages
  • export CFLAGS=”-g -fsanitize=thread -fno-omit-frame-pointer”
  • re-run configure (very important, else CFLAGS is not picked up!)
  • make clean (important, else make does not detect that it needs to build some files due to change of CFLAGS)
  • make
  • install as usual
If you came to this page trying to debug a rsyslog problem, we strongly suggest to run your instrumented version interactively. To do so:
  • stop the rsyslog system service
  • sudo -i (you usually need root privileges for a typical rsyslogd configuration)
  • execute /path/to/rsyslogd -n …other options…
    here “/path/to” may not be required and often is just “/sbin” (so “/sbin/rsyslogd”)
    “other options” is whatever is specified in your OS startup scripts, most often nothing
  • let rsyslog run; thread sanitizer will spit out messages to stdout/stderr (or nothing if all is well)
  • press ctl-c to terminate rsyslog run

Note that the thread sanitizer will display some false positives at the start (related to pthread_cancel, maybe localtime_r). The stack trace shall contain exact location info. If it does not, the ASAN_SYMBOLIZER is not correctly set, but usually it “just works”.
Doc on thread sanitizer ist available here: https://clang.llvm.org/docs/ThreadSanitizer.html

Automating Coverity Scan with a complex TravisCI build matrix

This is how you can automate Coverity Scan using Travis CI – especially if you have a complex build matrix:

  • create an additional matrix entry you will exclusively use for submission to Coverity
  • make sure you use your regular git master branch for the scans (so you can be sure you scan the real thing!)
  • schedule a Travis CI cron job (daily, if permitted by project size and Coverity submission allowance)
  • In that cron job, on the dedicated Coverity matrix entry:
  • cache Coverity’s analysis tools on your own web site, download them from there during Travis CI VM preparation (Coverity doesn’t like too-frequent downloads)
  • prepare your project for compilation as usual (autoreconf, configure, …) – ensure that you build all source units, as you want a full scan
  • run the cov-int tool according to Coverity instructions
  • tar the build result
  • use the “manual” wget upload capability (doc on Coverity submission web form); make sure you use a secure Travis environment variable for your Coverity token
  • you will receive scan results via email as usual – if you like, automate email-to-issue creation for newly found defects
Actual Sample Code: rsyslog uses this method. Have a look at it’s .travis.yml (search for “DO_COVERITY”). The scan run can be found here (search for “DO_CRON”), a script that calls another one ultimately doing the steps described above. These two scripts are called from .travis.yml.

Why so complicated? After all, there is a Coverity addon for Travis CI. That’s right, and it probably works great for simple build matrices. Rsyslog uses Travis quite intensely. At the time of this writing, we spawn 8 VMs for different tests and test environments plus the one for cron jobs (so 9 in total). With the Coverity addon, this would result in 9 submissions per run, because the addon runs once per VM requested. Given that rsyslog currently has an allowance of 3 scans per day, a single run would use up all of our allowance plus 6 more which would not succeed. Even if it would work, it would be a big waste of Coverity resources and be pretty impolitely both to Coverity and fellow users of the free scan service.

With the above method, we have full control over what we submit to Coverity and how we do it.With our current scan allowance, we use exactly one scan per day, which leaves two left for cases where it is useful to run interim scans during development.

Why do you use master branch for the scan? Coverity recommends a dedicated branch.
In my opinion, software quality assurance works well only if it is automated and is applied to what actually is being “delivered”. In that sense, real CI with one Coverity run per pull request (before we merge!) would be best. We can’t do this due to Coverity allowance limitations. The next best thing is to do it as soon as possible after merge (daily in our case) and make sure it is actually run against the then-current state. Introducing a different branch just for scan purposes sounds counter-productive:

  • you either need to update it to full master branch immediately before the scan, in which case you can use master directly
  • or you do update the “Coverity branch” only on “special events” (or even manually), which would be counter the idea to have things checked as early as possible. Why would you withhold scanning something that you merged to master, aka “this is good to go”? Sounds like an approach to self-cheating to me…
In conclusion, using scanning master is the most appropriate way to automatically scan your project. Be bold enough to do that.
 
Does your approach work for non-Travis, e.g. Buildbot?
Of course, it works perfectly with any solution that permits you to run scheduled “builds”. We also use Buildbot for rsyslog. We could have placed it there. Simply because of ease of use we have decided to use Travis for automating the scan as well. We could also have used a scheduled Buildbot instance. So the approach described here is universal. The only thing you really want is that the scan is initiated automatically at scheduled intervals – otherwise you could simply use the manual submit.

Busy at the moment…

Some might have noticed that I am not as active as usual on the rsyslog project. As this seems to turn out to keep at least for the upcoming couple of weeks, I’d like to give a short explanation of what is going on. Starting around the  begin of June I got involved into a political topic in my local village. It’s related to civil rights, and it really is a local thingy, so there is little point in explaining the complex story. What is important is that the originally small thing grew larger and larger and we now have to win a kind of election – which means rallies and the like. To make matters a little worse (in regard to my time…) I am one of the movement’s speakers and also serve as subject matter expert to our group (I am following this theme for over 20 years now). To cut a long story short, that issue has increasingly eating up my spare time and we are currently at a point where little is left.

Usually, a large part of my spare time goes into rsyslog and related projects. Thankfully, Adiscon funds rsyslog development, and so I can work on it during my office hours. However, during these office hours I am obliged to work on paid support cases and also a limited number of things not directly related to rsyslog. Unfortunately, August (and early September) is main holiday season in our region. As such, I also have limited co-workers available that I could share rsyslog work with. And to make matters “worse”, I need to train new folks to get started with rsyslog work – one of them does a summer internship, so I need to work with him now. While new folks is always a good thing to have on a project (and I really appreciate it), this means further reduction of my rsyslog time.

The bottom line is that due to all these things together, I am not really able to react to issues as quickly as I would like to. The political topic is expected to come to a conclusion -one way or the other- by the end of September. Due to personal reasons, I will not be able to do work at all in early October (long-planned out of office period), but I hope to be fully available again by mid October. And the good news is that we will have a somewhat larger team by this time because Jan, who does the internship, will continue to work part time on the project. Even better: Pascal will be with Adiscon for the next months on a full-time basis and will be able to work considerable hours on rsyslog.

So while we have a temporary glitch in availability, I am confident we’ll recover from that in autumn and we have very exciting work upcoming (for example, the TLS work Pascal has just announced). I have also a couple of very interesting suggestions which are currently discussed with support contract customers.

All in all, I beg for your patience. And I am really thankful to all of our great community members which do excellent work on the rsyslog mailing list, github and other places. Not to forget the great contributions we increasingly get. Looking forward to many more years of productive syslogging!

Introducing new team member

Good news: we have some new folks working on the rsyslog project. In a small mini-series of two blog postings I’d like to introduce them. I’ll start with Jan Gerhards, who already has some rsyslog-related material online.

Jan studies computer science at Stuttgart University. He has occasionally worked on rsyslog for a while. This summer, he does a two-month internship at Adiscon. He also plans to continue to work on rsyslog and logging-related projects in the future. Right now, he looks into improving the mmanon plugin. He is also trying to improve debug logging, which I admit is my personal favorite (albeit it looks like this is the lower-priority project ;-)).

Would creating a simple Linux log file shipper make sense?

I currently think about creating a very basic shipper for log files, but wonder if it really makes sense. I am especially concerned if good tools already exists. Being lazy, I thought I ask for some wisdom from those in the know before investing more time to search solutions and weigh their quality.

I’ve more than once read that logstash is far too heavy for a simple shipper, and I’ve also heard that rsyslog is also sometimes a bit heavy (albeit much lighter) for the purpose. I think with reasonable effort we could create a tool that

  • monitors text files (much like imfile does) and pulls new entries from them
  • does NOT further process or transform these logs
  • sends the resulting file to a very limited number of destionations (for starters, I’d say syslog protocol only)
  • with the focus on being very lightweight, intentionnally not implementing anything complex.
Would this be useful for you? What would be the minimal feature set you need in order to make it useful? Does something like this already exist? Is it really needed or is a stripped-down rsyslog config sufficient?
I’d be grateful for any thoughts in this direction.