CEE library will be named libee

After some discussions, we have finally decided to name the CEE part of the original liblognorm project “libee“. Note the missing “c” ;) We originally thought that libcee would be a smart name for a library implementing an API on top of the upcoming CEE standard. However, there were some complexities associated with that because “libcee” sounds much like *the* official library. While I hope I can finish a decent implementation of the full CEE standard, I think it is a bit bold to try to write the standard lib when the standard is not even finished.

So the compromise is to drop the “c”. The “ee” in libee means “Event Expression”, much as CEE means “Common Event Expression”. If CEE evolves as we all hope and if I manage to actually create a decent implementation of it, we may reconsider calling the library libcee. But that’s not important now and definitely something that will involve a lot of people.

In the mean time, I wanted to make sure you all understand what I mean when I talk about “libee”. I hope to populate its public git with at least a skeleton soon (liblognorm reaches a point where I need some CEE definitions, so this becomes more important).

liblognorm will use passive Unicode mode (UTF-8)

I thought a while on how to support Unicode in liblognorm. The final decision is to use passive mode, which is a very popular option under Linux. A core driver behind this decision is the ability to safe lots of space (and thus also cache space and so processing time as well) as the majority of log content is written in US-ASCII. This is even the case in Asian countries, where large parts of the log message are usually ASCII but contain a few select fields in local language support (like names). Even if the message itself is in local language, there is a lot of punctuation and numbers in them, so I think the overall result will not use up notably more space than a UTF-16 implementation. I18N-wise, it must also be noted that UTF-16 is a very small (but important) subset of full unicode, so using UTF-8 gives us the ability to encode full 32-bit UCS-4 characters should there be need to do so.

The same decision will apply to the CEE library (whatever it will be named). This is also nicely in line with libxm2, which I intend to use for XML parsing.

Will my CEE library be named libcee?

After my thinking about splitting off a CEE library from the liblognorm project, Michael Biebl suggested that the CEE part of it be named “libcee”. This sounds very natural and decent. However, CEE is a trademark of the Mitre corporation, which helps establish the standard. I have asked the folks there if it is possible to use this name. Not surprising, the answer will take a little while.

I need to progress while waiting. So I have decided to name the library libcee for now, but rename it if Mitre decides this is not possible. That should not be much of a problem, as this decision is probably much quicker made than me writing the first releasable version of it ;)

splitting up the normalization library

I have dug into the design of my upcoming event/log normalization library. As it will base on CEE, I intend to pull in CEE definitions for types defined there, like tags or field types. Also, I thought about what the library should output. An obvious choice for many use cases is an in-memory object model describing the normalized form of the event that was passed in. This is probably most convenient for applications that want to do further processing on the event.

However, it also seems useful to have the ability to serialize this data in the form of a text string. That string could be stored in a file for later reference, forensics or to feed some other tool capable of understanding the file format. And as the in-memory object model will be CEE based, and CEE defines such serialization formats, it seems obvious that the library should be able to generate serialization based on the CEE-defined and supported formats (note that does not necessarily means XML, it may be JSON or syslog structured data as well).

Looking at all this, the normalization library seems to consist of two largely independent (but co-operating) parts:

  • the parser engine itself, that part that is used to actually normalize the input string according to the provided sample base and CEE definitions
  • a CEE support library, which provides the plumbing for everything that is defined in CEE (like tags, field types and serialization formats)

Now consider that I intended to create the normalization feature as a separate library instead of a rsyslog plugin because I hope that other projects can reuse it. Looking at the above bullet points, it looks like it is also natural to split core parser from CEE functionality. Again, there seems to be a broader potential user base for generic CEE functionality than for normalization. For example, a CEE support library could also be used by projects that natively support CEE. It hopefully would safe them the hassle of coding all CEE functionality just to do some basic things. Think, for example, on some application that would “just” like to emit a well-formed CEE log record (a very common case, I guess). With a library, it could just generate (via the library) a proper in-memory representation of the event and then have the library process it. The library could then also check if it is syntactically correct and contains all the necessary fields to conform to a specific CEE profile.

The more I think about it, the more I think it is useful. So I’ll probably split the core normalization library from the CEE part. This is not much effort, but opens up additional uses. I’ll call the normalization part then liblognorm (or libeventnorm) and the CEE part libcee — or something along these lines. Under this light, liblognorm may actually be a better name, because the parser part is more concernd about logs and log files instead of generic events (which often come in other format).

Again, feedback is appreciated!