libee status update

Yesterday I reached some important milestones:
  • defined an internal simple event representation format “int” to create test cases
     http://doc.libee.org/classee__int.html
  • wrote a decoder “int” -> CEE
  • wrote a syslog encoder CEE->syslog
    (with enough escapes to be testable, but without all details)
  • wrote a small program (int2syslog) that ties these pieces togetherand can be used to visualize test cases in syslog representation
 All is available from the libee git at http://git.adiscon.com/?p=libee.git;a=summary

introducing libestr

I tried to avoid this, but it looks like I need to start a new project. I need a simple counted-byte string “class” for the CEE effort. All of that is already present in rsyslog, but using it’s runtime library is overkill. I could have copied the string code, but I really don’t like to have multiple copies of the same code around. So I’ll now create a new small library just for that purpose. Well, while I am at it, I’ll probably also add basic hashtable support, as this can help keep libee small for cases when no XML-specific functionality is desired. And I don’t want to start yet another library for that (we already have lib inflation… ;)).

I named it libestr, bare of a better name. I thought a while about the name, but could not find anything really decent. “estr” means “essentials for string handling” and is probably descriptive enough. Quite honestly, I really like to gain some speed in coding again instead of just creating skeletons and thinking about names…

logging and the C NUL problem

Again, I ran into the “C NUL Problem”, that is the way C strings are terminated. Unfortunately, the creators of C have represented strings as variable arrays of char without an explicitely-stated size. Instead of a size property, a C string is terminated by an US-ASCII NUL character (”). This works well enough in most cases, but has one serious drawback: the NUL character is a reserved character that cannot be part of any C string. So, for example strlen(“AB”) equals one and not three as one would usually expect.

CERT has a good presentation of some of the more important problems associated with the standard C string handling functions. I do not intend to reproduce this here or elaborate on any further details except that we get into big trouble if NUL characters are used e.g. in logging data sets. We had this problem in the IETF syslog WG, where we permited NUL to be part of the syslog message text, but permitted a receiver to escape it. This is far from being an ideal solution, but we considered it good enough, especially as it permits to keep compatible with existing toolset libraries.

Now, in CEE, we face the same challenge: the problem is if the in-memory representation of event fields should permit NUL characters. The correct technical answer to this question is “yes, of course”, but unfortunately it has a serious drawback that can affect adoption: if NULs are permited, none of the string handling functions of the C runtime library can be used. This is, as said above, because the C runtime library is not able to handle NULs inside “standard” C strings. A potential solution would be to escape NULs when they enter the system. However, that would require an additional reserved character, to do the escaping. So in any case, we’ll end up with a string that is different from what the “usual” runtime library routines expect.

Of course, this problem is not new, and many solutions already have been proved. The obvious solution is to replace the standard C string handling functions with octet-counting functions that do not require any reserved characters.

A short, non- weighted list of string replacement string libraries is:

Note that some of them try to mimic standard C strings as part of their API. I consider this highly dangerous, because it provides a false sense of security. While the library now can handle strings with included NUL characters (like “AB”), all parts of the string after the first NUL will be discarded if passed to a “regular” C runtime library string function (like printf!). So IMO this is a mis-feature. A replacement library must explicitely try to avoid compatibility to the C runtime library in order to safe the user from subtle issues, many of them resulting in security problems (think: information hiding).

Unfortunately, I could not identify any globally-accepted string replacement library that is in widespread use.. Despite its deficits, C programmers’  tend to use the plain old string functions present in the standard C runtime library.

So we are back to the original issue:

If CEE supports NUL characters inside strings, the C standard string library can not be used, and there are also problems with a potentially large number of other toolsets. This can lead to low acceptance rate.

But if CEE forbids NUL characters, data must be carefully asserted when it enters the system. Most importantly, a string value like “AB” must NOT be accepted when it is put in via an API. Experience tells that implementors sometimes simply overlook such restrictions. So this mode opens up a number of very subtle bug (security) issues.

I am very undicided which route is best. Obviously, a sound technical solution is what we want. However, the best technical solution is irrelevant if nobody actually uses it. In that light, the second best solution might be better. Comments, anyone?

libee – first peek preview available

I have just published a first preview of the libee library API. This work is obviously far from being finished, but I think the preview provides at least some idea of how it will materialize.

To tell a bit about the status in general: I have completed a first round of evaluation of CEE objects and concepts, based on the draft standard. Note that the draft standard is far from being finished, and will probably undergo serious changes. As such, I have not literally taken object definitions from there. Instead, I have mostly pulled what I am currently interested in and also have added some additions here and there, as I need them to do what I intend to do.

My primary goal is to get some proof of concept working first, then re-evaluate everything and probably do a rewrite of some parts. For the same reason, performance is currently not on my radar (but of course I always think about it ;)).

I would also like to note that the libee git is also available — and a good way to follow libee development for those interested. Comments are always appreciated and especially useful in this early phase.

Sagan and rsyslog [Guest Posting]

Hi everyone,

I am happy to feature a guest post written by Champ Clark III, the author of Sagan, a real time, “snort like” event and system log sniffing tool. Champ will explain a bit about it and how he made it work with rsyslog.  I think it is a very interesting project and I am glad it now has full rsyslog support.

But enough of my words, enjoy the real thing ;)

Rainer


I admit it, I’m a recent convert to rsyslog. I’ve known about rsyslog for years, but have only recently started using rsyslog in production environments. The primary reason for looking into rsyslog is users of Sagan are requesting support for it. I’m very glad they pushed me in that direction. I knew how popular rsyslog was,
but the ‘hassles’ of changing our core logging facilities seemed like a pain.

I can tell you, it was easy and seamless. Also, after reading Rainer Gerhards’ excellent, “rsyslog: going up from 40K messages per second to 250K“, I knew that I liked this project.

So I bit the bullet, and started working with Sagan and rsyslog. I haven’t looked back since.

I work in the network & computer security field. I’ve known for years the importance of log management. One thing that I had noticed was a lack of open source log & packet levelcorrelation engines. This is essentially what Sagan does. One common comparison of Sagan is Cisco’s MARS. Sagan reads in your logs and attempts to correlate the information with the intrusion detection/prevention system’s packet level information.

At Softwink, Inc, my place of employment, we monitor security events for various clients. At the
packet-level inspection for ‘bad events’ (security related), we use Snort. Snort ‘watches’ the network connectionsand sends out an ‘alert’ when it sees nefarious traffic. We configure Snort to send the ‘alert’ to a MySQL database for further analysis. We can then monitoring these Snort sensors for ‘bad events/attacks’ in real time.

However, we found that we were missing the ‘bigger picture’ without logs. This is where rsyslog and Sagan come into play. Essentially, we take all machines and equipment on a network and forward it to a
centralized server. Rsyslog is the ‘receiver’, and sometimes the sender of these log messages. In many cases, we find that centralized secure logging is a requirement for clients. With rsyslog, we
have the ability to store log information into a MySQL database for archive purposes. We can then give the client access to the log information via Loganalyzer for easy, simple retrieval.

How does Sagan fit into this picture? For security analysis, we only want key, typically security related, events from the logs. Manually searching databases for ‘security related’ events is prone to error. It is easy to ‘miss’ key events. Sagan is the ‘eyes on the logs’ watching for security related events in real time. First, Sagan has to have access to the logs coming into the network. This is very simple with Rsyslog:

# rsyslog.conf file.
#
# As rsyslog receives logs from remote systems, we put them into a format
# that Sagan can understand:
#

$template
sagan,"%fromhost-ip%|%syslogfacility-text%|%syslogpriority-text%|%syslogseverity-text%|%syslogtag%|%timegenerated:1:10:date-rfc3339%|%timegenerated:12:19:date-rfc3339%|%programname%|%msg%n"

# We now take the logs, in the above format, and send them to a 'named pipe'
# or FIFO.

*.* |/var/run/sagan.fifo;sagan

Sagan can now ‘read’ the logs as they come into rsyslog from the /var/run/sagan.fifo (named pipe/FIFO) in real time. rsyslog actually performs double duty for us; logging to our MySQLdatabase for archival purposes and handing Sagan log information for analysis.

Over all, there is nothing really new about this concept. However, Sagan does something a bit different than other log analysis engines. When Sagan sees a ‘bad event’, Sagan will log that to your Snort IDS/IPS MySQL/PostgreSQL database. What does this mean? Packet level security events and log events reside in
the same database for correlation. There are several advantages; for one, we can now have a single, unified console for log and IDS/IPS events! Second, we can now take advantage of Snort front-end
software to view log events. For example, if you use BASE or Snorby to view packet level IDS/IPS events, you can use the same software to view log level Sagan events. Maybe your shop uses report generation
tools that query the Snort database to show ‘bad packet events’ in your network. Guess what. You can use those same reporting tools for your log information as well. I’ve posted some example screen shots of Snort & Sagan working together here. The idea is that we take advantage of the Snort community’s work on consoles.

Correlation with Sagan and Snort, at the engine level, works several different ways. First, Sagan can in some cases pull network information directly from the log message and use that for correlation in the SQL database. For example, let’s say an attacker is probing your network and is attempting to get information on the SMTP port. The attacker sends your SMTP server ‘expn root’. Your IDS/IPS engine will ‘detect’ this traffic and
store it. It’ll record the source IP, destination IP, packet dump, time stamp, etc. Sagan will do the same at the log level. Sagan will ‘extract’ as much of the information from the log message for further correlation with the packet level.

Recently, Rainer announced liblognorm (early liblognorm website). This is an exciting project. The idea is to “normalize” log information to a nice, standard usable format. I plan on putting as much support and effort as I can into this project, because it’s an important step. For Sagan, it means we will be able to better
correlate information. In my time to ponder about it since its recent announcement, I can see liblognorm being extremely useful for many different projects.

Sagan also shares another feature with Snort; it uses the same rule sets. Sagan rules sets are very much ‘Snort like’. Here is an example rule (this is a single line, broken just for readability):

alert tcp $EXTERNAL_NET any -> $HOME_NET 22 (msg:"[OPENSSH] Invalid or illegal user";
pcre: "/invalid user|illegal user/i"; classtype: attempted-user;
program: sshd; parse_ip_simple; parse_port_simple; threshold:type limit,
track by_src, count 5, seconds 300; reference:
url,wiki.softwink.com/bin/view/Main/5000022; sid:5000022; rev:4;)

If you’re already a Snort user, then the format and syntax should be very simple to understand. We use ‘pcre’ (regular expressions) to ‘look’ for a message from the program ‘sshd’ that contains the term ‘invalid user’ or ‘illegal user’ (case insensitive). We set the classifications, just as Snort does (for further correlation). We can ‘threshold’ the rule, so we don’t get flooded with events.

Sagan uses this format for a variety of reasons. For one, its a well know format in the security field. Second, we can now take advantage of Snort rule maintenance software! For example ‘oinkmaster’ or ‘pulled pork’. The idea is that with Sagan, you don’t need to ‘re-tool’ your network in order for it to work.

Using Sagan with your Snort based IDS/IPS system is just a feature of Sagan. Sagan can operate independently from Snort databases, and offers the normal bells/whistlers you’d expect in a SEIM (e-mailing alerts, etc).

To tie all this together, it means we can simply monitor packet level threats and log level events from a unified console. We can monitor just about everything in a network from the log level standpoint. We can monitor Cisco gear, Fortigate firewalls, Linux/*nix servers, wireless access points, etc.

Sagan is a relatively new project and still under development. Like rsyslog, Sagan is built from the ground up with performance in mind. Sagan is multi-threaded and written in C with the thought that it should be as efficient with memory and the CPU(s) as possible. Rsyslog seems to follow the same philosophy, yet another reason I made the switch.

The more information you know about a threat to your network/system, the better off you’ll be. That is what the mix of rsyslog and Sagan offers. Throw in IDS/IPS (Snort) monitoring, and you can get a complete view about ‘bad things’ happening in your environment.

For more information about Sagan, please see http://sagan.softwink.com.

CEE library will be named libee

After some discussions, we have finally decided to name the CEE part of the original liblognorm project “libee“. Note the missing “c” ;) We originally thought that libcee would be a smart name for a library implementing an API on top of the upcoming CEE standard. However, there were some complexities associated with that because “libcee” sounds much like *the* official library. While I hope I can finish a decent implementation of the full CEE standard, I think it is a bit bold to try to write the standard lib when the standard is not even finished.

So the compromise is to drop the “c”. The “ee” in libee means “Event Expression”, much as CEE means “Common Event Expression”. If CEE evolves as we all hope and if I manage to actually create a decent implementation of it, we may reconsider calling the library libcee. But that’s not important now and definitely something that will involve a lot of people.

In the mean time, I wanted to make sure you all understand what I mean when I talk about “libee”. I hope to populate its public git with at least a skeleton soon (liblognorm reaches a point where I need some CEE definitions, so this becomes more important).

liblognorm site comes online

While there is not yet much content, the liblognorm site has been put online today. Over time, this will become an important place to both learn about liblognorm AND share log samples. It will most probably also contain the area that you can use to download new log samples (much like you download virus patterns for a scanner). But for now, I just wanted to share the good news.

liblognorm will use passive Unicode mode (UTF-8)

I thought a while on how to support Unicode in liblognorm. The final decision is to use passive mode, which is a very popular option under Linux. A core driver behind this decision is the ability to safe lots of space (and thus also cache space and so processing time as well) as the majority of log content is written in US-ASCII. This is even the case in Asian countries, where large parts of the log message are usually ASCII but contain a few select fields in local language support (like names). Even if the message itself is in local language, there is a lot of punctuation and numbers in them, so I think the overall result will not use up notably more space than a UTF-16 implementation. I18N-wise, it must also be noted that UTF-16 is a very small (but important) subset of full unicode, so using UTF-8 gives us the ability to encode full 32-bit UCS-4 characters should there be need to do so.

The same decision will apply to the CEE library (whatever it will be named). This is also nicely in line with libxm2, which I intend to use for XML parsing.