log normalization: how to share rulebases?

Rulebases play a crucial role in log normalization. While the log normalizer itself needs to be of high quality and speed, it is the rulebase that really helps to detect which message the one in question is. I myself have so far concentrated on the code and not created any larger rulebase. Champ Clarck III has created many more for his use inside Sagan. But this means everything is in its infancy. What we really need is community involvement to create a large number of easy to access rulebases for almost all devices.

This brings up the question of how to manage and share such a repository. One method may be to place it on a web site, together with some submission tool. An alternate approach would be to put everything into a public git. This latter approach has some beauty, because git is universally available and well know. Even if a user does not know git, only a minimal set of commands is required to pull the rulebase. So maybe this is the way to go?

I would be very interested in suggestions on how we shall manage rulebases and spread the word. What do we need to support a great community? Whom can we talk to? If you have any ideas, concerns, questions or even an idle rant, be sure to let me know. At best, send mail to the lognorm mailing list, so we can broadcast this to other folks interested.

calling for log samples!

Now I join those mass of people who are asking for log samples. But I do for a good reason :) Also, I do not need a lot, a single log message works well for my needs. I need them to improve rsyslog so that the parser can even better handle exotic message formats. So the short story is if you have a syslog message, please provide it to me.

And here is the long story:

One of the strength of rsyslog is that it is very much focused on standards. That also means it tries to parse syslog messages according to the relevant RFCs. Unfortunately, syslog has been standardized only recently and so there is no real standard for what to expect inside the header. So rsyslog strength is also its weakness: if messages are ill-formed, results are often suboptimal.

I am working around this by doing smart guesswork inside the legacy syslog parser. However, every now and then some folks pop up with problems. And, more importantly, some others do not even ask. On my twitter account, I recently saw one such frustration. In that case, timestamps were duplicated. I guess that was caused by something unexpected inside the timestamp. However, I was not able to get down to the real problem, because I did not have access to the raw message. That’s an important point: I need the raw message content, not what happens to usually be in the logfile. The later is already parsed, processed and recombined, so it does not tell me what the actual message is. But I need the actual message to improve the parser.

What I would like to do is create a very broad test suite with a vast amount of real-life syslog formats. The message text itself is actually not so important to me at this stage. It is the header format. If I get this, I’d like to analyze the different ways in which the format is malformed and then try to find ways to implement it inside the parser. If I find out that I can not detect the right format in all cases automatically, I may find ways to configure the different formats. The end result, I hope, will be far more plug-and-play message detection, something that should be of great benefit for all users.

Please contribute your logs! I need logs from many different devices, with many different versions. But I need only a few lines from each one. For each individual contributor, there is not a lot of effort required. Even a single log line would be great (ten or so be even greater). Just please don’t mangle the logs and provide me with raw log messages. That’s probably the hardest part. One way to do it is to sniff them off the wire, for example with WireShark. Another way is to use rsyslog itself. All you need is a special template and an output file using it:

$template rawmsg,”%rawmsg%n”
*.* /path/to/raw-file.log

Add this to your rsyslog.conf, restart rsyslog, make the device emit a few lines and mail me the result to rgerhards@gmail.com. You may also simply post the log sample to the sample log thread on the rsyslog forum – whatever you prefer. After you have done that, you can remove the lines from rsyslog.conf again. Before you mail me, it is a good idea to check if there is any sensitive information inside the log file. Feel free to delete any lines you have, but I would appreciate if you do not modify line contents. Also, it would be useful for me if you let me know which device, vendor and version produced the log.

I hope that you can help me improve the rsyslog parser even more. Besides, it will probably be a very interesting experiment to see how different syslog messages really are.

Thanks in advance for all contributions. Please let them flow!

Rainer

logging and the C NUL problem

Again, I ran into the “C NUL Problem”, that is the way C strings are terminated. Unfortunately, the creators of C have represented strings as variable arrays of char without an explicitely-stated size. Instead of a size property, a C string is terminated by an US-ASCII NUL character (”). This works well enough in most cases, but has one serious drawback: the NUL character is a reserved character that cannot be part of any C string. So, for example strlen(“AB”) equals one and not three as one would usually expect.

CERT has a good presentation of some of the more important problems associated with the standard C string handling functions. I do not intend to reproduce this here or elaborate on any further details except that we get into big trouble if NUL characters are used e.g. in logging data sets. We had this problem in the IETF syslog WG, where we permited NUL to be part of the syslog message text, but permitted a receiver to escape it. This is far from being an ideal solution, but we considered it good enough, especially as it permits to keep compatible with existing toolset libraries.

Now, in CEE, we face the same challenge: the problem is if the in-memory representation of event fields should permit NUL characters. The correct technical answer to this question is “yes, of course”, but unfortunately it has a serious drawback that can affect adoption: if NULs are permited, none of the string handling functions of the C runtime library can be used. This is, as said above, because the C runtime library is not able to handle NULs inside “standard” C strings. A potential solution would be to escape NULs when they enter the system. However, that would require an additional reserved character, to do the escaping. So in any case, we’ll end up with a string that is different from what the “usual” runtime library routines expect.

Of course, this problem is not new, and many solutions already have been proved. The obvious solution is to replace the standard C string handling functions with octet-counting functions that do not require any reserved characters.

A short, non- weighted list of string replacement string libraries is:

Note that some of them try to mimic standard C strings as part of their API. I consider this highly dangerous, because it provides a false sense of security. While the library now can handle strings with included NUL characters (like “AB”), all parts of the string after the first NUL will be discarded if passed to a “regular” C runtime library string function (like printf!). So IMO this is a mis-feature. A replacement library must explicitely try to avoid compatibility to the C runtime library in order to safe the user from subtle issues, many of them resulting in security problems (think: information hiding).

Unfortunately, I could not identify any globally-accepted string replacement library that is in widespread use.. Despite its deficits, C programmers’  tend to use the plain old string functions present in the standard C runtime library.

So we are back to the original issue:

If CEE supports NUL characters inside strings, the C standard string library can not be used, and there are also problems with a potentially large number of other toolsets. This can lead to low acceptance rate.

But if CEE forbids NUL characters, data must be carefully asserted when it enters the system. Most importantly, a string value like “AB” must NOT be accepted when it is put in via an API. Experience tells that implementors sometimes simply overlook such restrictions. So this mode opens up a number of very subtle bug (security) issues.

I am very undicided which route is best. Obviously, a sound technical solution is what we want. However, the best technical solution is irrelevant if nobody actually uses it. In that light, the second best solution might be better. Comments, anyone?

syslog normalization

I am working on syslog normalization for quite some years now. A couple of days ago, David Lang talked to me about syslog-ng’s patterndb, an approach to classify log messages and extract properties from it.

I have looked at this approach, and it indeed is promising. One ingredient, though, is missing, that is a directory of standard properties (like bytes sent and received in traffic logs). I know this missing ingredient very well, because we also forgot it until recently.

The aim to normalize log data is far from being new. Actually, I think it is one of the main concerns in log analysis. Probably one of the first folks who thought seriously about it was Marcus Ranum, who coined the concept of “artificial ignorance”, meaning that we can remove those messages from a big pile of logs that we know to be uninteresting. But in order to do that correctly, you need to know how exactly they look. And this is where log normalization comes in. I have written an in-depth paper in 2004, title “On the nature of syslog data“. The version officially published claims “work in progress”, but it still has all the juicy details.

Internally, we implemented this approach in our MonitorWare products a little bit later. For example, it is used inside the “Post Process Action” in WinSyslog (Michael also wrote a nice article on how to parse log messages with this action). While this was a great addition (and is used with great success), I failed to get enough community momentum to build a larger database of log messages that could be used as a basis for large scale log normalization. One such – largely failed for syslog – approach is the event knowledge base.

However, I did not give up on the general idea and proposed it wherever appropriate. The last outcome of this approach is the soon-to-be-released Adiscon LogAnalyzer v3, which uses so-called message parsers to obtain useful information from log entries. Here, I hope we will be able to gain more community involvement. We already got two message parsers contributed. Granted, that’s not much, but the ability to have them is so far little known. With the release of v3, I hope we get more and more momentum.

The syslog-ng patterndb approach brings an interesting idea to this space: as far as I have heard (I generally do NOT look at competing code to prevent polluting my code with things that I should not use), they use radix trees to parse the log messages. That is a clever approach, as it provides a solution for much quicker parsing large amounts of parse templates. This makes the approach suitable for real-time normalization of an incoming stream of syslog data.

Adiscon LogAnalyzer, by contrast, uses a regex-based approach, but that primarily for simplicity in an effort to invite more contributions (WinSyslog has a far more sophisticated approach). In Adiscon LogAnalyzer we began to become serious with identifying what a property actually means. While we have a fixed set of properties, with fixed semantics, in both WinSyslog, MonitorWare Agent and rsyslog, this set is rather limited. The Windows product line supports ease of extension of the properties, but does not provide standard IDs for those properties.

In Adiscon LogAnalyzer, we have fixed IDs for a larger set of properties, now about 50 or so. Still, that set is very small. But we created it with the intention to be able to map various “semantic objects” from different log entries to a single identity. For example, most firewall logs will contain a source and destination IP address, but almost all firewalls will use different log message formats to do that. So we need to have different analyzers to support these native formats, for example in reports. In Adiscon LogAnalyzer, we can now have a message parser “normalize” these syslog entries and map the vendor-specific format to the generic “semantic object”. Thus the upper layers (like views and reports) then work on these normalized semantic objects and do not need to be adopted to each firewall. This needs only be done at the parser level.

Such a directory of semantics objects would be very useful in my humble opinion. We are currently working on making it publicly available, all this in the hope for a community to involve itself ;) If we manage to get a large enough number of log and/or parser contributions, we may potentially be able to make Adiscon LogAnalyzer an even better free tool for system administrators.

And as there is hope that this will finally succeed, I have begun to think about a potential implementation inside rsyslog. It doesn’t sound very hard, but still requires careful thinking. One thing I would like to see is a unified approach that covers at least rsyslog and Adiscon Loganalyzer, and hopefully the Windows tools as well.

Another very good thing is that there already is a standard for providing standard semantical objects: during the IETF syslog standardization effort, I pressed hard for so-called structured data elements. I managed to get them into the final RFC. These structured data elements are now the key for conveying the log information once it is normalized: the corresponding name-value pairs can easily be encoded with it.

I hope we will finally able to succeed on this road, because I think this would be of tremendous benefit for the syslog community.

is a third-level domain suspect to google?

If you follow this blog, you’ve probably already heard that we are doing a name change for phpLogCon: it will soon be known under the name Adiscon LogAnalyzer (with the Adiscon in front of the “real” name to ease potential legal issues).

Among others, that means we need to change the web site. Not surprisingly, no second-level domain with loganalyzer in it was available at the time we searched. Most of them, of course, been taken by domain spammers. So we settled for the loganalyzer.adiscon.com name. As I found out yesterday in Google Webmaster Tools, that this may cause some troubles. Google provides a “Change of Adress” tool that is meant to be used in situation.

However, I discovered that this tool does not work with third-level domains. All I see when I try to use that tool is the message “Setting is restricted to root level domains, only” (as a slight technical side-note, it should say “second level domain” as I don’t think it works for com, net, org, … only ;)).

While browsing the google help forum, I found that others seems to have similar problems. For example, people in the UK, where everything is a third-level domain (for example, .co.uk is what .com is for the international Internet).

Given this stance, I wonder if google punishes third-level domain sites in any other way. If so, our decision to move to the new site may not be a good one. I have posted a question in the Google help forums. I guess I will not get a definite response, but maybe one can read between the lines.

I will keep you posted, also on the overall progress of the name/site switch. We have now entered the “hot phase”, meaning that we actually intend to roll over to the new site within the next couple of days. Stay tuned for more news and more features.

converting NetApp Filer to syslog – and MS API Changes…

One of the things that is done with EventReporter and MonitorWare Agent is forwarding NetApp Filer event logs via syslog. There are essentially two ways how this can be done: either via backup event log files (*.evt), which NetApp writes in a Windows compatible format or (a newer approach) via an Event Log API Emulation inside the NetApp box. If you’d like to know details, you can find them in our NetApp EventLog to Syslog forwarding paper.

Unfortunately, changes in recent Windows versions cause some trouble with the way the forwarding works. Unfortunately, Microsoft seems to have changed the on-disk format of backup event log files. That’s OK and something that usually can happen. What I find strange is that Microsoft does no longer supply code inside Windows (and its APIs!) to read downlevel event logs. So, for example, on Windows 2008 it is no longer possible to read a backup event log from an older release. This includes the NetApp .evt files, as they are written in the older Windows format.

I do not understand the Microsoft decision. It would not have been hard to preserve backward compatibility – a header flag inside the file plus very few code inside the o
operating system would have been sufficient. But without that, we see trouble that the NetApp .evt files can not be accessed by Windows Event Viewer and, consequently, not yet by our eventlog to syslog tools. Thankfully, though, we support all modes NetApp provides, and so the work-around is to use the NetApp Event Log API emulation, which will also get the necessary information out to syslog. But, again, I do not understand how Microsoft can break backward compatibility in thus an unnecessary way.

Anyhow, things are as they are ;) So far, we are also looking at ways to be able to process the NetApp backup event log files even under these new constraints. And as you know, we are already full of ideas. Of course, I also recommended to opening a support ticket with Microsoft – I am too eager to learn the official response to this situation (and -maybe- a solution)? I’ve been told we’ll open the ticket today, so let’s see what comes out of all that…

syslog data modeling capabilities

As part of the IETF discussions on a common logging format for sip, I explained some sylsog concepts to the sip-clf working group.

Traditionally, syslog messages contain free-form text, only – aimed at human observers. Of course, today most of the logging information is automatically being processed and the free-form text creates ample problems in that regard.

The recent syslog RFC series has gone great length to improve the situation. Most importantly, it introduced a concept called “Structured Data”, which permits to express information in a well-structured way. Actually, it provides a dual layer approach, with a corase designator at the upper layer and name/value pairs at the lower layer.

However, the syslog RFC do NOT provide any data/information modeling capabilities that come with these structured data elements. Their syntax and semantics is to be defined in separate RFCs. So far, only a few examples exist. One of them is the base RFC5424, which describes some common properties that can be contained in any syslog message. Other than that, RFC5674, which describes a mapping to the Alarm MIB and ITU perceived severities and RFC5675, which describes a mapping to SNMP traps. All of them are rather small. The IHE community, to the best of my knowledge, is currently considering using syslog structured data as an information container, but has not yet reached any conclusion.

Clearly, it would be of advantage to have more advanced data modeling capabilities inside the syslog base RFCs, at least some basic syntax definitions. So why is that not present?

One needs to remember that the syslog standardization effort was a very hard one. There were many different views, “thanks” to the broad variety of legacy syslog, and it was extremely hard to reach consensus (thus it took some years to complete the work…). Next, one needs to remember that there is such an immense variety in message content and objects, that it is a much larger effort to try define some generic syntaxes and semantics (I don’t say it can not be done, but it is far from being easy). In order to get the basics done, the syslog WG deciced to not dig down into these dirty details but rather lay out the foundation so that we can build on it in the future.

I still think this is a good compromise. It would be good if we could complement this foundation with some already existing technology. SNMP MIB encoding is not the right way to go, because it follows a different paradigm (syslog is still meant to be primarily clear text). One interesting alternative which I saw, and now evaluate, is the ipfix data modeling approach. Ideally, we could reuse it inside structured data, saving us the work to define some syslog-specific model of doing so.

The most important task, however, is to think about, and specify, some common “information building blocks”. With these, I mean standard properties, like source and destination ID, mail message id, bytes sent and received and so on. These, together with some standard syntaxes, can greatly relieve problems we face while consolidating and analyzing logs. Obviously, this is an area that I will be looking into in the near future as well.

It may be worth noting that I wrote a paper about syslog parsing back in 2004. It was, and has remained, work in progress. However, Adiscon did implement the concept in MonitorWare Console, which unfortunately never got wider exposure. Thinking about it, that work would benefit greatly from the availability of standardized syslog data models.

new phplogcon site

Today, I received a first more or less complete link to what will become the new phplogcon site. The site is not yet live, but will provide some of the new features.

If you look at it, you’ll probably notice a couple of things. First of all, the name “phpLogCon” is no longer spelled out. The reason is that we considered a bit bulky and meaningless. “Loganalyzer” is exactly what the tools is about. But, of course, there are a myriad of (trademark) problems related to that name. So we try to avoid all confusion by calling it “Adiscon loganalyzer”, hoping that the company name as dominant part of the product name will rule out all problems. For that very reason, you’ll also see me to refer to Adiscon Loganalyzer in the future. If you wonder why I stress that “Adiscon” part, you now know why.

Secondly, you will notice the fresh design. While I am not a visual guy, I have to say that I like it very much. I think it removes much of the clutter and makes it easier to find the information you need quickly. We also have changed the content management system in the background. The new sites uses WordPress, which seems to be highly approprioate for what the site needs. Of course, the wiki and forum will remain as they are – they have proven to be quite well as they are.

If you look more closely, you will also note that Adiscon LogAnalyzer gets an important new component: a reporting module. I managed to convince my peers at Adiscon to move some of our MonitorWare Console closed source technology into Adiscon LogAnalyzer. My long-term vision is that reporting capabilities will much enhance the utility of this tool. In order for Adiscon to get something back, we will begin to develop some enhanced reports, which will be non-free for commercial users. However, the base product as well as some base reports, will always remain free!

I hope you consider this to be good news, just as I think! Thanks to everyone who made this possible.

Some thoughts on reliability…

When talking syslog, we often talk about audit or other important data. A frequent question I get is if syslog (and rsyslog in specific) can provide a reliable transport.

When this happens, I need to first ask what level of reliability is needed? There are several flavors of reliability and usually loss of message is acceptable at some level.

For example, let’s assume the process writes out log messages to a text file. Under (allmost?) all modern operating systems and by default, this means the OS accepts the information to write, acks it, does NOT persist it to storage and lets the application continue. The actual data block is usually written a short while later. Obviously, this is not reliable: you can lose log data if an unrecoverable i/o error happens or something else goes fatally wrong.

This can be solved by instructing the operating system to actually persist the information to durable store before returning back from the API. You have to pay a big performance toll for that. This is also a frequent question for syslog data, and many operators do NOT sync and accept a small message loss risk to save themselves from requiring a factor of 10 servers of what they now need.

But even if writes are synchronous, how does the application react? For example: what shall the application do if log data cannot be written? If one really needs reliable logging, the only choice is to shutdown the application when it can no longer log. I know of very few systems that actually do that, even though “reliability” is highly demanded. Here, the cost of shutting down the application may be so high (or even fatal), that the limited risk of log data loss is accepted.

There are a myriad of things when thinking about reliability. So I think it is important to define the level of reliability that is required by the solution and do that in detail. To the best of my knowledge, this is also important for operators who are required by law to do “reliable” logging. If they have a risk matrix, they can define where it is “impossible” (for technical or financial reasons) to achieve full reliability and as of my understanding this is information auditors are looking for.

So for all cases, I strongly recommend to think about which level of reliability is needed. But to provide an answer for the rsyslog case: it can provide very high reliability and will most probably fulfil all needs you may have. But there is a toll in both performance and system uptime (as said above) to go to “full” reliability.

The typical logging problem as viewed from syslog

I run into different syslog use cases from time to time. So I thought it is a good idea to express what I think the typical logging problem is. As I consider it the typical problem, syslog (and WinSyslog and rsyslog in specific) address most needs very well. What they spare is the analysis and correlation part, but other members of the family (like our log analyzer) and third parties care well for that.

So the typical logging problem, as seen from the syslog perspective, is:

  1. there exists events that need to be logged
  2. a single “higher-level” event E may consist of a
    number of fine-grained lower level events e_i
  3. each of the e_i’s may be on different
    systems / proxies
  4. each e_i consists of a subset of properties
    p_j from a set of all possible common properties P
  5. in order to gain higher-level knowledge, the
    high-level event E must be reconstructed from
    e_i’s obtained from *various* sources
  6. a transport mechanism must exist to move event
    e_i records from one system to another, e.g., to
    a central correlator
  7. systems from many different suppliers may be involved,
    resulting in different syntax and semantic of
    the higher-level objects
  8. there is potentially a massive amount of events
  9. events potentially need to be stored for
    an extended period of time
  10. quick review of at least the current event data
    (today, past week) is often desired
  11. there exists lots of noise data
  12. the data needs to be fed into backend processes,
    like billing systems