How we found and fixed a CVE in librelp

This is a joint blog post, from Adiscon and Semmle, about the finding and fixing of CVE-2018-1000140, a security vulnerability in librelp. This was a collaborative effort by:

  • Kevin Backhouse, Semmle, Security Researcher.
  • Rainer Gerhards, Adiscon, Founder and President.
  • Bas van Schaik, Semmle, Head of Product.

We have published this post on Rainer’s blog here and the LGTM blog.

Bas originally found the vulnerability (using lgtm.com) and Rainer fixed it. Kev developed the proof-of-concept exploit.

In this blog post, we explain the cause of the bug, which is related to a subtle gotcha in the behavior of snprintf, and how it was found by a default query on https://lgtm.com/. We also demonstrate a working exploit (in a docker container, so that you can safely download it and try it for yourself). As a bonus, we give a short tutorial on how to set up rsyslog with TLS for secure communication between the client and server.

Severity and mitigation

The vulnerability (CVE-2018-1000140) is in librelp versions 1.1.1 up to 1.2.14.

Librelp is a component used heavily inside rsyslog. It is also used in some other projects, but we use the term “rsyslog” as a proxy for any affected projects, as it is the prime user of librelp (to the best of our knowledge). However, if you use librelp in any project, we strongly recommend that you upgrade to version 1.2.15 or newer.

The vulnerability only affects rsyslog installations that use TLS for secure RELP communication between the client and the server. In its default configuration, rsyslog does not do any client-server communication at all, so the vulnerability only affects more advanced installations. Furthermore, to trigger the vulnerability, an attacker needs to create a malicious certificate that has been signed by a certificate authority that is trusted by the rsyslog

The ‘snprintf’ gotcha

The vulnerability is in the following block of code (lines 1195-1211 of `tcp.c`):


/* first search through the dNSName subject alt names */
iAltName = 0;
while(!bFoundPositiveMatch) { /* loop broken below */
  szAltNameLen = sizeof(szAltName);
  gnuRet = gnutls_x509_crt_get_subject_alt_name(cert, iAltName,
                                                szAltName, &szAltNameLen, NULL);
  if(gnuRet < 0) break; else if(gnuRet == GNUTLS_SAN_DNSNAME) { pThis->pEngine->dbgprint("librelp: subject alt dnsName: '%s'\n", szAltName);
    iAllNames += snprintf(allNames+iAllNames, sizeof(allNames)-iAllNames,
                          "DNSname: %s; ", szAltName);
    relpTcpChkOnePeerName(pThis, szAltName, &bFoundPositiveMatch);
    /* do NOT break, because there may be multiple dNSName's! */
  }
  ++iAltName;
}

It is caused by a subtle gotcha in the behavior of `snprintf`. You have to read this excerpt of the man page for `snprintf` quite carefully to spot the problem:

The functions `snprintf()` and `vsnprintf()` do not write more than *size* bytes (including the terminating null byte (‘\0’)). If the output was truncated due to this limit then the return value is the number of characters (excluding the terminating null byte) which would have been written to the final string if enough space had been available. Thus, a return value of *size* or more means that the output was truncated.

The Red Hat Security Blog has a great post about this, written by Florian Weimer in March 2014. The crucial point is that `snprintf` returns the length of the string that it *would have* written if the destination buffer was big enough, not the length of the string that it actually wrote. So the correct way to use `snprintf` is something like this:


int n = snprintf(buf, sizeof(buf), "%s", str);
if (n < 0 or n >= sizeof(buf)) {
printf("Buffer too small");
return -1;
}

Alternatively, you could use it like this to allocate a buffer that is exactly the right size:


int n = snprintf(0, 0, "%s", str);
n++; // Add space for null terminator.
if (n <= 0) {
printf("snprintf error\n");
return -1;
}
buf = malloc(n);
if (!buf) {
printf("out of memory\n");
return -1;
}
snprintf(buf, n, "%s", str);

This second use-case is probably the reason why `snprintf` is designed the way it is. However, given that the type of its size parameter is `size_t`, it still seems like a bizarre design choice that its return type is `int`, rather than `size_t`.

Interestingly, some code bases use their own implementations of `snprintf`, which don’t necessarily behave in the same way. For example, there’s one in curl and another in sqlite. Other code bases use wrapper functions to work around inconsistencies in the implementation of `snprintf` on different platforms, like this one in flac.

How the ‘snprintf’ gotcha can cause a buffer overflow

The cruel irony of the `snprintf` gotcha is that it can cause the exact buffer overflow that the programmer was diligently trying to avoid by using `snprintf`, rather than `sprintf`. This is exactly what happened on line 1205 of `tcp.c`:


iAllNames += snprintf(allNames+iAllNames, sizeof(allNames)-iAllNames,
"DNSname: %s; ", szAltName);

Suppose `szAltName` is a sufficiently long string to cause a buffer overflow. Then `snprintf` does not write beyond end of the buffer. However, due to the surprising behavior described above, it still returns the length of the string that it *would have* written if the buffer was big enough. This means that after the `+=` operation, `iAllNames > sizeof(allNames)`. On the next iteration of the loop, `sizeof(allNames)-iAllNames` overflows negatively and `snprintf` is called with an extremely large size parameter. This means that subsequent iterations of the loop write beyond the end of the buffer.

As a side note, a dynamic test utilizing Undefined Behavior Sanitizer (UBSAN) would have detected that problem. Unfortunately there is no such test case inside the rsyslog test suite. That’s the usual problem with dynamic testing; it always depends on the test cases. A strength of static analysis is that such situations can be detected even if they are overlooked by developers and QA engineers. A more elaborate discussion of this topic can be found in Rainer’s blog post on the benefits of static analysis.

An interesting feature of this bug is that there is a gap in the buffer overflow. Suppose that there are 10 bytes left in the buffer and that the length of `szAltName` is 100 bytes. Then only the first 10 bytes of the string are written to the buffer. On the next iteration of the loop, the next string is written to a starting offset 90 bytes after the end of the buffer. An attacker can utilize this gap to only overwrite a very specific area of the stack. It also means that they can avoid overwriting the stack canary.

Finding and fixing the bug

The bug was found by Bas, after a conversation with Rainer on twitter. Rainer had just heard about lgtm.com and asked Bas if he could upload rsyslog. At the time, LGTM’s support for C/C++ projects was still in beta, so Bas needed to upload it manually. Bas uploaded it to a non-public LGTM instance first to make sure that the results looked reasonable and noticed this result. It was found by our Potentially overflowing call to snprintf query, which is designed to find this exact pattern. Bas consulted with Kev, who agreed that it looked like a genuine vulnerability. Kev then immediately contacted Rainer, so that he could fix the bug before it went public on lgtm.com.

Rainer fixed the bug one day after Kev reported it to him and released librelp version 1.2.15, which includes the fix, two days after that.

Using rsyslog with secure TLS communication

In the next section, we demonstrate a proof-of-concept exploit. But first, we need to explain how to set up rsyslog with TLS secured communication between RELP client and server, because the vulnerability is in the code that checks the certificate of the client. The instructions that we explain here are quite similar to this tutorial, except that we use openssl, rather than GnuTLS, to generate the certificates.

In its default configuration, rsyslog is just used for logging on a single machine. In a more advanced configuration, you might want to have multiple client machines sending log messages to a central server. If so, you can use TLS to securely send the log messages to the server. To enable TLS communication, every machine in the cluster needs a public/private key pair. Additionally, all of these keys need to be signed by a Certificate Authority (CA), which has its own public/private key pair. Every machine in the cluster needs to know the public key of the CA, so that they can verify each other’s keys: when two members of the cluster connect to each other, they send each other certificates signed by the CA to prove to each other that they are legitimate members of the cluster. In this tutorial, we generate public/private key pairs for the CA, client, and server. Then we start a server rsyslog and a client rsyslog in two separate docker containers and connect them to each other.

An important word of warning about this tutorial:

To make the tutorial easy to run, we have set it up so that all the certificates are generated by the `docker build` step. This means that all the private keys are stored in the same docker image. *Don’t do this in a real installation!* You should generate all the certificates separately so that each machine only knows its own private key.

To make the tutorial as simple as possible to try out, we have uploaded the necessary scripts to GitHub. You can download the scripts and build the docker image as follows:


git clone https://github.com/Semmle/SecurityExploits.git
cd SecurityExploits/rsyslog/CVE-2018-1000140_snprintf_librelp
docker build . -t rsyslog-demo

Besides installing all the necessary software packages, the Dockerfile also runs the following setup steps:

  1. Download the source code for rsyslog, librelp and related components and check out the version with the vulnerability.
  2. Build and install rsyslog from source, by calling this build script.
  3. Call this script to generate certificates for the client, server, and CA.
  4. Call this script to generate a malicious certificate. We demonstrate this in the next section.

Generating certificates with openssl is normally an interactive process, in which it asks you questions about your name, email address, organization, and so on. To make the process fully automatic, we have instead used configuration files. For example, `ca.config` contains the configuration for the certificate authority and `server.config` contains the configuration for the server. Hopefully it is reasonably clear which lines of the configuration files you need to change if you are setting up your own rsyslog installation.

Using docker, we can simulate the scenario where there are two machines, a client and a server, both running rsyslog. First we need to create a virtual network, so that the client can connect to the server:


docker network create -d bridge --subnet 172.25.0.0/16 rsyslog-demo-network

Now, in two separate terminals, we start two docker containers: one for the server and one for the client. In terminal 1, we start a container for the server like this:


docker run --network=rsyslog-demo-network --ip=172.25.0.10 -h rsyslog-server -i -t rsyslog-demo

In terminal 2, we start a container for the client like this:


docker run --network=rsyslog-demo-network --ip=172.25.0.20 -h rsyslog-client -i -t rsyslog-demo

The docker image contains rsyslog configuration files for the client and server. These configuration files reference the TLS certificates which were generated during the building of the docker image. We can now start the client and server. In terminal 1, we start the rsyslog server:


sudo rsyslogd -f benevolent/rsyslog-server.conf

(Note: the docker image is configured so that the `sudo` password is “x”.)

In terminal 2, we start the rsyslog client:


sudo rsyslogd -f benevolent/rsyslog-client.conf

There is now a secure TCP connection between the client and server.

Proof-of-concept exploit

The purpose of the vulnerable block of code is to search through the subject alternative names in the certificate, looking for a name that it recognizes. For example, we have specified in the configuration file for the rsyslog server that it should only allow a client with the name `client.wholesomecomputing.com`. Subject alternative names are usually used to enable one certificate to cover multiple subdomains. For example, the certificate for `example.com` might also include `www.example.com` and `help.example.com` as subject alternative names. In the context of rsyslog, you might use the subject alternative names to generate a single certificate that covers all the machines in the cluster (although this would be less secure than generating a separate certificate for each machine).

To trigger the vulnerability, we need a certificate with a large number of very long subject alternative names. The size of the stack buffer, `allNames`, is 32768 bytes. Alternative names that exceed 1024 bytes are rejected by `gnutls_x509_crt_get_subject_alt_name`, so the config file for our malicious certificate contains 33 alternative names, most of which are just over 1000 bytes long. The 32nd name is slightly shorter to adjust the gap, so that the 33rd name overwrites the return address on the stack.

To run the exploit PoC, start the two docker containers as before. In the client container, start rsyslog with the malicious configuration:


sudo rsyslogd -f malicious/rsyslog-client.conf

The rsyslog server runs in the background, so you need to use `ps` to see that it has crashed. Alternatively, you can attach gdb to it before you start the malicious client, to see the effect of the exploit in slow motion. If you want to do this, then you need to add a few extra command line flags when you start the docker container for the server:

docker run --network=rsyslog-demo-network --ip=172.25.0.10 -h rsyslog-server --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -i -t rsyslog-demo

These extra arguments disable the default security settings in docker which prevent gdb from attaching to another process. Without them, you are likely to encounter an error message like this. This solution was suggested here.

Timeline

New Web Site Online

Really no big news. But after roughly 10 years I managed to revamp my personal web site. This time, it’s destined to be slim and stable. The majority of content is on other sites, e. g. my syslog blog or github.

I still find it is useful to have kind of a personal home in virtual space. So here it is, and it is severely renovated. Let’s see when it gets the next brush-up…

Time for a better Version Numbering Scheme!

The traditional major.minor.patchlevel versioning scheme is no longer of real use:

  • users want new features when they are ready, not when a new major version is crafted
  • there is a major-version-number-increase fear in open source development, thus major version bumps sometimes come very random (see Linux for example)
  • distros fire the major-version-number-increase fear because they are much more hesitant to accept new packages with increased major version
In conclusion, the major version number has become cosmetic for many projects. Take rsyslog as an example: when we switched to rsyslog scheduled releases, we also switched to just incrementing the minor version component, which now actually increments with each release – but nothing else. We still use the 8.x.0 scheme, where x is more or less the real version number. We keep that old scheme for cosmetic reasons, aka “people are used to it”.
Some projects, most notably systemd, which probably invented that scheme, are more boldly: the have switched to a single ever-increasing integer as version (so you talk of e.g. version 221 vs 235 of systemd). This needs some adaption from the user side, but seems to be accepted by now.
And different other approach, known from Ubuntu, is to use “wine-versioning”: just use the date. So we have 14.04, 16.04, 17.04 representing year and month of release. This also works well, but is unusual yet for single projects.
 
What made me think about this problem? When Jan started his log anonymizer project, we thought about how it should be versioned. We concluded that a single ever-incrementing version number is the right path to take. For this project, and most probably for all future ones. Maybe even rsyslog will at some time make the switch to a single version digit…

A Proposal for Rsyslog State Variables

As was discussed in great lenghth on the rsyslog mailing list in October 2013, global variables as implemented in 7.5.4 and 7.5.5 were no valid solution and have been removed from rsyslog. At least in this post I do not want to summarize all of this – so for the details please read the mailing list archive.

Bottom line: we need some new facility to handle global state and this will be done via state variables. This posting contains a proposal which is meant as basis for discussion. At the time of this writing, nothing has been finalized and the ultimate solution may be totally different. I just keep it as blog posting so that, if necessary and useful, I can update the spec towards the final solution.

Base Assumptions
There are a couple of things that we assume for this design:

  • writing state variables will be a  very infrequent operation during config execution
  • the total number of state variables inside a ruleset is very low
  • reading occurs more often, but still is not a high number (except for read-only ones)

Syntax
State variables look like the previous global variables (e.g. “$/var” or “$/p!var”), but have different semantics based on the special restrictions given below.

Restrictions
Restrictions provide the correctness predicate under which state vars are to be evaluated. Due to the way the rsyslog engine works (SIMD-like execution, multiple threads), we need to place such restrictions in order to gain good performance and to avoid a prohibitively amount of development work to be done. Note that we could remove some of these restriction, but that would require a lot of development effort and require considerable sponsorship.

  1. state variables are read-only after they have been accessed
    This restriction applies on a source statement level. For example,
    set $/v = $/v + 1;
    is considered as one access, because both the read and the write is done on the same source statement (but it is not atomic, see restriction #3). However,
    if $/v == 10 then
        set $/v = 0;
    is not considered as one access, because there are two statements involved (the if-condition checker and the assignment).
    This rule requires some understanding of the underlying RainerScript grammar and as such may be a bit hard to interpret. As a general rule of thumb, one can set that using a state variable on the left-hand side of a set statement is the only safe time when this is to be considered “one access”.
  2. state variables do not support subtree access
    This means you can only access individual variables, not full trees. For example, accessing $/p!var is possible, but accessing $/p will lead to a runtime error.
  3. state variables are racy between diffrent threads except if special update functions are use
    State variable operations are not done atomic by default. Do not mistake the one-time source level access with atomicity:
    set $/v = $/v + 1;
    is one access, but it is NOT done atomically. For example, if two threads are running and $/v has the value 1 before this statement, the outcome after executing both threads can be either 2 or 3. If this is unacceptable, special functions which guarantee atomic updates are required (those are spec’ed under “new functions” below).
  4. Read-only state variables can only be set once during rsyslog lifetimeThey exist as an optimization to support the common  usecase of setting some config values via state vars (like server or email addresses).

Basic Implementation Idea
State variable are implemented by both a global state as well as message-local shadow variables. The global state holds all state variables, whereas the shadow variables contain only those variables that were already accessed for the message in question. Shadow variables, if they exist, always take precedency in variable access and are automatically created on first access.

As a fine detail, note that we do not implement state vars via the “usual” JSON method, as this would both require more complex code and may cause performance problems. This is the reason for the “no-subtree” restriction.

Data Structure
State vars will be held in a hash table. They need some decoration that is not required for the other vars. Roughly, a state var will be described by

  • name
  • value
  • attributes
    • read-only
    • modifyable
    • updated (temporary work flag, may be solved differently)

Implementation
This is pseudo-code for the rule engine. The engine executes statement by statement.

for each statement:
   for each state var read:
      if is read-only:
         return var from global pool
     else:
         if is in shadow:
             return var from shadow pool
        else:
             read from global pool
             add to shadow pool, flag as modifiable
             return value
    for each state var written:
        if is read-only or already modified:
           emit error message, done
        if is not in shadow:
            read from global pool
            add to shadow pool, flag as modifiable
        if not modifieable:
           emit error message, done
        modify shadow value, flag as modified

at end of each statement:
   scan shadow var pool for updated ones:
       propagate update to global var space
       reset modified flag, set to non-modifiable
  for all shadow vars:
       reset modifiable flag

Note: the “scanning” can probably done much easier if we assume that the var can only be updated once per statement. But this may not be the case due to new (atomic) functions, so I just kept the generic idea, which may be refined in the actual implementation (probably a single pointer access may be sufficient to do the “scan”).

The lifetime of the shadow variable pool equals the message lifetime. It is destrcuted only when the message is destructed.

New Statements
In order to provide some optimizations, new statements are useful. They are not strictly required if the functionality is not to be implemented (initially).

  • setonce $/v = value;
    sets read-only variable; fails if var already exists.
  • eval expr;
    evaluates an expression without the need to assign the result value. This is useful to make atomic functions (see “New Functions”) more performant (as they don’t necessarily need to store their result).

New Functions
Again, these offer desirable new functionality, but can be left out without breaking the rest of the system. But they are necessary to avoid cross-thread races.

  • atomic_add(var, value)
    atomically adds “value” to “var”. Works with state variables only (as it makes no sense for any others).

This concludes the my current state of thinking. Comments, both here or on the rsyslog mailing list, are appreciated. This posting will be updated as need arises and no history kept (except where vitally important).

Special thanks to Pavel Levshin and David Lang for their contributions to this work!

[OT] friend needs help in music contest ;)

Hi folks, my blog’s title says it is about the many thing’s Rainer is interested in. If you look at the postings, that seems to mean “syslog”. Today is one of the rare occurences that I write about something else, and it is also a request for a little bit of help…

Friends of mine produce fine (non-mainstream) music. They have entered a local contest over here with a re-interpretation of a classical theme. I really enjoy the modern “feeling” of the theme. I would like to invite all of you to listen yourself – and vote for them if you like what you hear. “Pepper Dance“, as it is called, is available form this site. Unfortunately it is in German only, but navigation shouldn’t be too hard: the play button is pretty universal ;-). And if you’d like to vote for it, just follow the big “Jetzt Abstimmen” link below the photo.

Potential Blog Unreliability

Hi all, as you probably know, my blog’s design hasn’t changed in ages (yup, I’m a conservative guy). However, it finally is time to update things, so I’ll look at some new design (and maybe software) options. That means that the blog may be a bit under construction during the  next couple of days. Please pardon any problems associated with that — they will be temporary.

refactored imfile module now publically available

Hi all,

the refactoring and enhancement of imfile done by Nikolaidis Fotis has reached a first public-visible result. I have just created a new experimental git branch with his work:

http://git.adiscon.com/?p=rsyslog.git;a=shortlog;h=refs/heads/v5-stable-imfile-refactor

The code is probably not 100% finished, but it provides a preview plus the ability to suggest further improvements.

Please note that the code base has evolved since the refactoring begun. So in order to integrate into the main code line, we must see that we can adopt to the recent changes to imfile. Most importantly, you should try to merge this branch with v5-devel, which gives me a lot of issues when I try to do it. We will look at this once the new modules codebase has matured (and Nikolaidis gives me a “go” for that activity).
Feedback on the new module is appreciated (please direct it to the rsyslog mailing list).

rsyslog hit the one million message per second mark

Thanks to a user report, I just found out the the recent v5 versions of rsyslog have hit and exceeded the one million message per second (mps) mark! In my lab, with commodity hardware I reach around 500,000 mps. In practice this seems to be at least doubled :)

This strengthens even further that rsyslog is the syslogd of choice for demanding high-traffic environments :)

The full report is here:

http://kb.monitorware.com/000-000-character-messages-sec-with-rsyslog-t10740.html

Multi-Threading rsyslog’s TCP input

A form thread made me aware that there seems to be an issue with rsyslog performance if TLS is used. The past two weeks, I have worked on a paper which looks in-depth at rsyslog performance an I came across a paper [1] that promotes writing servers in that “traditional” multi-threaded way (with a single thread per connection). It addressed some of my concerns, and I thought it is worth actually trying out this approach (I outruled it for several years and never again looked at it). As a result, I created an experimental module imttcp, which works in this mode. I put this to test, especially as that would also lead to a much simpler programming paradigm. Unfortuantely, the performance results are devastive: while there is a very slight speedup with  a low connection number (close to the number of cores on the system), there is a dramatic negative speedup if running with many threads. Even at only 50 connections, rsyslog is dramatically slower (80 seconds for the same workload which was processed in 60 seconds with traditional imtcp or when running on a single connection). At 1,000 connections, the run was *extremely* slow. So this is definitely a dead-end. To be honest, Behren, condit and Brewer (the authors of [1]) claim that the problem lies in the current implementation of thread libraries. As one cure, they propose user-level threads. However, as far as I could find out, User-Level threads seem not to be much faster under Linux than Kernel-Level threads (which I used in my approach).

Even more convincing is, from the rsyslog PoV, that there are clear reasons why the highly threaded input must be slower:

  • batch sizes are smaller, leading to much more overhead
  • many more context switches are needed to switch between the various i/o handlers
  • more OS API calls are required because in this model we get more   frequent wakeups on new incoming data, so we have less data available to read at each instant
  • more lock contention because many more threads compete on the main queue mutex

All in all, this means that the approach is not the right one, at least not for rsyslog (it may work better if the input can be processed totally independent, but I have note evaluated this). So I will look into an enhanced event-based model with a small set of input workers pulling off data (I assume this is useful for e.g. TLS, as TLS transport is much more computebound than other inputs, and this computation becomes a limiting factor for the overall processing speed under some circumstances – see [2]).

As a side-note: for obvious reasons, I will not try to finish imttcp. However, I have decided to leave it included in the source tree, so that a) someone else can build on it, if he sees value in that b) I may use it for some other tests in the future.

[1] R. Von Behren, J. Condit, and E. Brewer. Why events are a bad idea
      (for high-concurrency servers). In Proceedings of the 9th conference on Hot
     Topics in Operating Systems-Volume 9, page 4. USENIX Association, 2003.

Sagan and rsyslog [Guest Posting]

Hi everyone,

I am happy to feature a guest post written by Champ Clark III, the author of Sagan, a real time, “snort like” event and system log sniffing tool. Champ will explain a bit about it and how he made it work with rsyslog.  I think it is a very interesting project and I am glad it now has full rsyslog support.

But enough of my words, enjoy the real thing ;)

Rainer


I admit it, I’m a recent convert to rsyslog. I’ve known about rsyslog for years, but have only recently started using rsyslog in production environments. The primary reason for looking into rsyslog is users of Sagan are requesting support for it. I’m very glad they pushed me in that direction. I knew how popular rsyslog was,
but the ‘hassles’ of changing our core logging facilities seemed like a pain.

I can tell you, it was easy and seamless. Also, after reading Rainer Gerhards’ excellent, “rsyslog: going up from 40K messages per second to 250K“, I knew that I liked this project.

So I bit the bullet, and started working with Sagan and rsyslog. I haven’t looked back since.

I work in the network & computer security field. I’ve known for years the importance of log management. One thing that I had noticed was a lack of open source log & packet levelcorrelation engines. This is essentially what Sagan does. One common comparison of Sagan is Cisco’s MARS. Sagan reads in your logs and attempts to correlate the information with the intrusion detection/prevention system’s packet level information.

At Softwink, Inc, my place of employment, we monitor security events for various clients. At the
packet-level inspection for ‘bad events’ (security related), we use Snort. Snort ‘watches’ the network connectionsand sends out an ‘alert’ when it sees nefarious traffic. We configure Snort to send the ‘alert’ to a MySQL database for further analysis. We can then monitoring these Snort sensors for ‘bad events/attacks’ in real time.

However, we found that we were missing the ‘bigger picture’ without logs. This is where rsyslog and Sagan come into play. Essentially, we take all machines and equipment on a network and forward it to a
centralized server. Rsyslog is the ‘receiver’, and sometimes the sender of these log messages. In many cases, we find that centralized secure logging is a requirement for clients. With rsyslog, we
have the ability to store log information into a MySQL database for archive purposes. We can then give the client access to the log information via Loganalyzer for easy, simple retrieval.

How does Sagan fit into this picture? For security analysis, we only want key, typically security related, events from the logs. Manually searching databases for ‘security related’ events is prone to error. It is easy to ‘miss’ key events. Sagan is the ‘eyes on the logs’ watching for security related events in real time. First, Sagan has to have access to the logs coming into the network. This is very simple with Rsyslog:

# rsyslog.conf file.
#
# As rsyslog receives logs from remote systems, we put them into a format
# that Sagan can understand:
#

$template
sagan,"%fromhost-ip%|%syslogfacility-text%|%syslogpriority-text%|%syslogseverity-text%|%syslogtag%|%timegenerated:1:10:date-rfc3339%|%timegenerated:12:19:date-rfc3339%|%programname%|%msg%n"

# We now take the logs, in the above format, and send them to a 'named pipe'
# or FIFO.

*.* |/var/run/sagan.fifo;sagan

Sagan can now ‘read’ the logs as they come into rsyslog from the /var/run/sagan.fifo (named pipe/FIFO) in real time. rsyslog actually performs double duty for us; logging to our MySQLdatabase for archival purposes and handing Sagan log information for analysis.

Over all, there is nothing really new about this concept. However, Sagan does something a bit different than other log analysis engines. When Sagan sees a ‘bad event’, Sagan will log that to your Snort IDS/IPS MySQL/PostgreSQL database. What does this mean? Packet level security events and log events reside in
the same database for correlation. There are several advantages; for one, we can now have a single, unified console for log and IDS/IPS events! Second, we can now take advantage of Snort front-end
software to view log events. For example, if you use BASE or Snorby to view packet level IDS/IPS events, you can use the same software to view log level Sagan events. Maybe your shop uses report generation
tools that query the Snort database to show ‘bad packet events’ in your network. Guess what. You can use those same reporting tools for your log information as well. I’ve posted some example screen shots of Snort & Sagan working together here. The idea is that we take advantage of the Snort community’s work on consoles.

Correlation with Sagan and Snort, at the engine level, works several different ways. First, Sagan can in some cases pull network information directly from the log message and use that for correlation in the SQL database. For example, let’s say an attacker is probing your network and is attempting to get information on the SMTP port. The attacker sends your SMTP server ‘expn root’. Your IDS/IPS engine will ‘detect’ this traffic and
store it. It’ll record the source IP, destination IP, packet dump, time stamp, etc. Sagan will do the same at the log level. Sagan will ‘extract’ as much of the information from the log message for further correlation with the packet level.

Recently, Rainer announced liblognorm (early liblognorm website). This is an exciting project. The idea is to “normalize” log information to a nice, standard usable format. I plan on putting as much support and effort as I can into this project, because it’s an important step. For Sagan, it means we will be able to better
correlate information. In my time to ponder about it since its recent announcement, I can see liblognorm being extremely useful for many different projects.

Sagan also shares another feature with Snort; it uses the same rule sets. Sagan rules sets are very much ‘Snort like’. Here is an example rule (this is a single line, broken just for readability):

alert tcp $EXTERNAL_NET any -> $HOME_NET 22 (msg:"[OPENSSH] Invalid or illegal user";
pcre: "/invalid user|illegal user/i"; classtype: attempted-user;
program: sshd; parse_ip_simple; parse_port_simple; threshold:type limit,
track by_src, count 5, seconds 300; reference:
url,wiki.softwink.com/bin/view/Main/5000022; sid:5000022; rev:4;)

If you’re already a Snort user, then the format and syntax should be very simple to understand. We use ‘pcre’ (regular expressions) to ‘look’ for a message from the program ‘sshd’ that contains the term ‘invalid user’ or ‘illegal user’ (case insensitive). We set the classifications, just as Snort does (for further correlation). We can ‘threshold’ the rule, so we don’t get flooded with events.

Sagan uses this format for a variety of reasons. For one, its a well know format in the security field. Second, we can now take advantage of Snort rule maintenance software! For example ‘oinkmaster’ or ‘pulled pork’. The idea is that with Sagan, you don’t need to ‘re-tool’ your network in order for it to work.

Using Sagan with your Snort based IDS/IPS system is just a feature of Sagan. Sagan can operate independently from Snort databases, and offers the normal bells/whistlers you’d expect in a SEIM (e-mailing alerts, etc).

To tie all this together, it means we can simply monitor packet level threats and log level events from a unified console. We can monitor just about everything in a network from the log level standpoint. We can monitor Cisco gear, Fortigate firewalls, Linux/*nix servers, wireless access points, etc.

Sagan is a relatively new project and still under development. Like rsyslog, Sagan is built from the ground up with performance in mind. Sagan is multi-threaded and written in C with the thought that it should be as efficient with memory and the CPU(s) as possible. Rsyslog seems to follow the same philosophy, yet another reason I made the switch.

The more information you know about a threat to your network/system, the better off you’ll be. That is what the mix of rsyslog and Sagan offers. Throw in IDS/IPS (Snort) monitoring, and you can get a complete view about ‘bad things’ happening in your environment.

For more information about Sagan, please see http://sagan.softwink.com.