With the interest in privacy concerns currently having a “PRISM-induced high”, I wanted to elaborate a little bit about what rsyslog’s Guardtime signature provider actually transmits to the signature authority.
This is a condensed post of what the provider does, highlighting the main points. If you are really concerned, remember that everything is open source. So you are invited to read the actual signature provider source, all of which is available at the rsyslog git.
The most interesting question first: the provider does only send a top-level hash to the signature authority. No actual log record will ever be sent or otherwise disclosed.
The way this works is that the provider creates a “smart” log signature chain. Actually, it is not a simple chain but rather a Merkle Tree (that’s why I call it “smart”).
When a log file is opened, the provider checks the signature status file if the file already contains some signed records and, if so, picks up the last hash. Then, hashing is initialized. For each record being written, both the previous hash and the current record are concatenated, hashed and the Merkle tree is updated. The actual signature is written when it is “time to do so”. And “time to do so” is when the file either needs to be closed (e.g. rsyslogd shutdown, eviction from dynafile cache) or the configured max block size is reached. In that case, the “last” hash (actually the root of the local Merkle tree) is sent to the signature authority for signing. This is the only data item ever sent out remotely. The signature authority replies with a signature for that hash, and the signature is stored.
Now let’s refresh you crypto know-how with the fact that a cryptographic hash function is necessarily a one-way function. That means there is no way to deduce from the hash what the original data is (except, of course, for brute-force attacks, what is a well-known fact and easily circumvented by using sufficiently large hash sizes). So even with the “last” hash being known by the authority, there is no way to know or even guess what the actual log records look like.
The bottom line is that all processing of the actual log records is done on the local rsyslog machine. You’ll even notice that by a slight (up to 30% depending on settings) increase in CPU use of the local machine. This is the actual hashing taking place.
Do you know wonder why log signatures can still be verified and trusted even if under attack. The answer is relatively straightforward, but requires some “cryptographer thinking”: It is impossible to alter the “last” hash, and that last hash has been signed (and timestamped!). So if you try to re-hash, that last hash would change. If you now verify, hashes do not match. You may think of this systems as multiple distributed Merkle trees in action, where the lowest-level Merkle tree is generated by the local rsyslog, and its top-level hash being provided to the external upper part of the tree (where it actually goes into the bottom-level computation). But I think now I involved myself (and you) too much into details…
Some more details on how the signing happens can also be found in my LinuxTag 2013 presentation (see slide 24 and following) and the accompanying paper.