libee – first peek preview available

I have just published a first preview of the libee library API. This work is obviously far from being finished, but I think the preview provides at least some idea of how it will materialize.

To tell a bit about the status in general: I have completed a first round of evaluation of CEE objects and concepts, based on the draft standard. Note that the draft standard is far from being finished, and will probably undergo serious changes. As such, I have not literally taken object definitions from there. Instead, I have mostly pulled what I am currently interested in and also have added some additions here and there, as I need them to do what I intend to do.

My primary goal is to get some proof of concept working first, then re-evaluate everything and probably do a rewrite of some parts. For the same reason, performance is currently not on my radar (but of course I always think about it ;)).

I would also like to note that the libee git is also available — and a good way to follow libee development for those interested. Comments are always appreciated and especially useful in this early phase.

splitting up the normalization library

I have dug into the design of my upcoming event/log normalization library. As it will base on CEE, I intend to pull in CEE definitions for types defined there, like tags or field types. Also, I thought about what the library should output. An obvious choice for many use cases is an in-memory object model describing the normalized form of the event that was passed in. This is probably most convenient for applications that want to do further processing on the event.

However, it also seems useful to have the ability to serialize this data in the form of a text string. That string could be stored in a file for later reference, forensics or to feed some other tool capable of understanding the file format. And as the in-memory object model will be CEE based, and CEE defines such serialization formats, it seems obvious that the library should be able to generate serialization based on the CEE-defined and supported formats (note that does not necessarily means XML, it may be JSON or syslog structured data as well).

Looking at all this, the normalization library seems to consist of two largely independent (but co-operating) parts:

  • the parser engine itself, that part that is used to actually normalize the input string according to the provided sample base and CEE definitions
  • a CEE support library, which provides the plumbing for everything that is defined in CEE (like tags, field types and serialization formats)

Now consider that I intended to create the normalization feature as a separate library instead of a rsyslog plugin because I hope that other projects can reuse it. Looking at the above bullet points, it looks like it is also natural to split core parser from CEE functionality. Again, there seems to be a broader potential user base for generic CEE functionality than for normalization. For example, a CEE support library could also be used by projects that natively support CEE. It hopefully would safe them the hassle of coding all CEE functionality just to do some basic things. Think, for example, on some application that would “just” like to emit a well-formed CEE log record (a very common case, I guess). With a library, it could just generate (via the library) a proper in-memory representation of the event and then have the library process it. The library could then also check if it is syntactically correct and contains all the necessary fields to conform to a specific CEE profile.

The more I think about it, the more I think it is useful. So I’ll probably split the core normalization library from the CEE part. This is not much effort, but opens up additional uses. I’ll call the normalization part then liblognorm (or libeventnorm) and the CEE part libcee — or something along these lines. Under this light, liblognorm may actually be a better name, because the parser part is more concernd about logs and log files instead of generic events (which often come in other format).

Again, feedback is appreciated!