new rsyslog config: a thank you to our users

I wanted to thank all those users who have commented during the past three (!) years on config format questions. I have consolidated all input and hope I have come up with a decent solution. Obviously, not everyone will like everything, but I hope I could find a good compromise.

So far, my blog (and the rsyslog site) has the best glimpse at the new format:

http://rainer.gerhards.net/2011/07/rsyslog-6-3-3-config-format-improvements.html

It is compatible with the old legacy format, supports simple control-flow structures (no loops, by intension) and builds heavily on name value pairs for things like actions, inputs, global settings, …

A real-world sample I used for parser development can be found at the rsyslog git web.

You’ll also find the grammar files inside that source directory in the git tree. It may be interesting for those used to flex/bison. My next step is to develop the necessary code for the name/value pair objects. That requires some more plumbing inside the core and changes to *all* loadable modules. Sounds like a lot of work, but I still hope to get this done before the summer break.

I have also started thinking about v7. It will contain a tree-based execution engine, which potentially offers even higher speed and far more options for configuration. This change is too big to make it into v6. Note that v6 will support “if .. then” and probably “if .. then .. else” but not nesting of these statements — because the rule engine does not support that. The main goal for v7 is to support nesting, including the (considerable) relevant engine changes.

I hope the new format is useful and does not upset too many. Sorry for the silence on the final selection. Past experience told me that there were too many totally conflicting views of what  the format should look like. I was deeply concerned that a broader detail discussion would have derailed this effort again. So I used known arguments and my best judgment to create the final format. Please all be assured that your arguments were deeply considered and extremely useful in getting this done.

For example, a recent mailing list discussion brought up very good argument why we actually needed to support old- and new-style config for include files. It turned out that actually the best way to solve that problem was to actually extend legacy format rather than completely replace it. This has the added advantage that textbooks, courses and a myriad of Internet-HowTos do not need to be rewritten. Besides that, I think that constructs like

mail.info /var/log/maillog

are really hard to beat in simplicity and clearness, so I think it is valuable to have them as part of the config language.

Thanks again to everyone for helping make this happen.

Rainer

rsyslog 6.3.3 config format improvements

In rsyslog 6.3.3, the config processor has finally changed. The old legacy processor (and with it the early RainerScript implementation) is thrown out and has been replaced by the so-called RainerScript processor (why that crazy name?). This is an extremely important step for rsyslog, as it now has the foundation for a much better and intuitive rsyslog.conf format. However, most of that can not be seen in 6.3.3, as it requires more work, especially in the plugin arena. Still, there are a couple of smaller improvements available.

Most importantly, the performance of script based filters has been considerably enhanced. Preliminary testing shows a three times speedup (we’ll do more benchmarking at a later stage; there is also still lots of room for optimization ;-)).

The ugliness of continuation lines has been removed. They may still be used, and this may make a lot of sense with some actions, but you are usually no longer forced to use continuation lines. Take this config snippet from a leading distro:


if ( 
     /* kernel up to warning except of firewall  */ 
     ($syslogfacility-text == 'kern')      and      
     ($syslogseverity <= 4 /* warning */ ) and not  
     ($msg contains 'IN=' and $msg contains 'OUT=') 
 ) or ( 
     /* up to errors except of facility authpriv */ 
     ($syslogseverity <= 3 /* errors  */ ) and not  
     ($syslogfacility-text == 'authpriv')           
 ) 
then /dev/tty10
& |/dev/xconsole

This can now be written as follows:


if ( 
     /* kernel up to warning except of firewall  */
     ($syslogfacility-text == 'kern')      and     
     ($syslogseverity <= 4 /* warning */ ) and not 
     ($msg contains 'IN=' and $msg contains 'OUT=')
 ) or (
     /* up to errors except of facility authpriv */
     ($syslogseverity <= 3 /* errors  */ ) and not 
     ($syslogfacility-text == 'authpriv')          
 ) 
then /dev/tty10
& |/dev/xconsole

Of course, this is not a real big advantage, but can be very useful during day-to-day operations. Forgetting the continuation marker is easy and has happend quite often, causing many more problems than necessary.

Also, the somewhat unintuitive use of “&” to chain actions together can now (optionally) be replaced by so-called blocks. For example,


authpriv.err /dev/tty10
&            |/dev/xconsole

can now be written as


authpriv.err { /dev/tty10
               |/dev/xconsole
             }

This looks much more familiar and thus intuitive to many users. Of course, both the old style as well as the new style is supported.

Finally, the need to use single quote characters (‘) over the usual double quotes (“) in script based filters was often a source of confusion. You may now use both, so ‘string’ and “string” works both. However, strings in double quotes will support parameter replacement in later versions of rsyslog. That is “Message is $msg” will evaluate to exactly this string in 6.3.3, but $msg will be resolved to the actual message content some time in the future. So be careful if you use double quotes.

Of course, none of these changes are the important ones so many users are waiting for, most importantly an intuitively-usable scoping for actions and inputs. These will be coming up shortly. I need to write some more engine code *and* need to enhance plugins to support that. I’ll probably start with actions as first. Note that the RainerScript processor already parses some of these constructs, but the rest of the engine simply ignores them. In order to get you an idea of how it will look, see this hypothetical example:


if $msg contains "error" then {
    action(type="omfwd" protocol="tcp" target="10.0.0.1:514"
           action.retryCount="-1"
           queue.type="linkedList" queue.fileName="fwdRule" queue.maxDiskSpace="1g"
           queue.saveOnShutdown="on"
          )
    action(type="omfile" target="/var/log/somelog.log")
    action(type="omuser" target="all" action.onceInterval="30")
}

I hope the example is intuitive enough to grasp it’s meaning ;) In current format, you need to write


$ActionQueueFileName fwdRule1
$ActionQueueMaxDiskSpace 1g 
$ActionQueueSaveOnShutdown on 
$ActionQueueType LinkedList
$ActionResumeRetryCount -1
if $msg contains 'error' then @@10.0.0.1:514
& /var/log/somelog.log
$ActionExecOnlyOnceEveryInterval 30
& :omusrmsg:*

At least to me, the upcoming new way looks much nicer ;)

In regard to the distro-example given above, I’ll try to simplify it towards this form:


if ( /* kernel up to warning except of firewall  */
     hasPRI("kern.warn") and not 
     ($msg contains 'IN=' and $msg contains 'OUT=')
    ) or hasPRI("authpriv.err")
    then { /dev/tty10
          |/dev/xconsole
         }

But that’s the second step after introducing the new action statements.

Please note that the final format selection was very carefully based on many discussions both on the mailing list and inside the forum as well as needs to preserve backwards compatibility. For example, on Debian, packages drop rsyslog-specific configs into the /etc/rsyslog.d directory and expect them to be understood. In order to break things here, we needed to remain compatible with the legacy format and extend it. Only thanks to the good user feedback we could finally come up with a solution that the majority of users hopefully will find good.

With that said, I’ll now see that I create the actual release. For obvious reasons, 6.3.3 will be a bit shaky as far as the config is concerned. Most probably it will also not run the full testbench successfully (due to some very esoteric tests that are broken by actual functionality changes). However, you can be sure that the engine works well as long as it passed the config stage, because there were almost no changes during runtime (well… script filter expression evaluation has been rewritten from scratch).

rsyslog: important step to new config format

I have just merged the master-newconf git branch into rsyslog‘s master branch. With that, the new config parser becomes part of the main line. This is a very important step, as it lays the foundation to an enhanced, easier to use config format. The current version has only few enhancements, but provides the necessary plumbing to do some real nice work within the next couple of weeks. Not only as a side-effect, the performance of script-based filters has been notably increased.

I expect a release with the current state within this week. Mostly cleanup and doc work remains to be done.

Why omusrmsg is evil – and how it is fixed…

Traditional syslog files simply use the user name (or “*” for all) to send messages to users. For example, this selector will forward all mail error messages to the poor mail admin named “madmin”:

mail.err madmin

This syntax is (somewhat) intuitive, but causes severe issues when it comes to extending the configuration language. Let’s assume the mail admin is named Ian Faber and has the user name “if”. So the syslog selector would be

mail.err if

This is ok with traditional config files, but creates a problem if the language is extended. For example, rsyslog has an “if expression then” construct. Question now: how to differentiate between the user name “if” and the “if” construct? Rsyslog uses context information in order to do this. At the start of a line “if” must be the “if” construct, because “if” as a user name would require a filter in front of it. This works pretty well, but creates some complexity during config file parsing. It may also be counter-intuitive to many users. If the language is further extended (as I am doing in v6), it creates considerable extra complexity.

To resolve that ambiguity, I have upgraded omusrmsg, which handles this kind of actions, to support the regular rsyslog syntax for action configuration. You now write:

mail.err :omusrmsg:if


The extra “:omusrmsg:” tells rsyslog explicitely that an action starts and so clearly flags what the “if” stand for. This is a very vital update, and so I am extending it into all versions starting with v4. I am right now working on these changes and will release all versions ASAP. I’ll create another post when this is done. It is highly recommended to use the new syntax exclusively. The older syntax will go away in a while.

full blown dns cache for rsyslog

Up until and including version 5 rsyslog does actually not implement a real DNS cache. Instead, it uses some sort-of caching methods. They seem to work surprisingly well, as almost no real pain was reported by users in regard to this system. The big exception is UDP traffic, if combined with template options that require host resolution and a larger number of different hosts sending messages.

Starting with 6.3.1, I have now implemented a real, full-blown cache system which will resolve the issues with that use case. The initial implementation is not perfect, but I thought it would be best to gain some feedback from the community first before deciding on the final implementation. Most importantly, it currently does not expire entries (this was considered not necessary in many previous discussions we had on the mailing list). Also the current linear list data structure and locking method used is not optimal. However, it is very simple and easy to maintain. So if there is no need for more advanced (aka “complex”) code, it probably is not bad to stay simple.

I hope to get some feedback from the community, and most importantly feedback from folks who actually use the new capability to their benefit. In those cases where it matters, the speedup can be “immense”.

new rsyslog config system materializes…

The past weeks I have worked pretty hard on the new rsyslog config system. The legacy system was quite different from what you expect in modern software. Most importantly, the legacy system applied directives as the config file was read, which made it extremely hard to restructure the config file format. That also prevented features like privilege drop from working fully correct.

I have now basically changed the config system so that there is a clear difference between the config load phase and applying the config. Most importantly, this means privilege drop now works correctly in all cases (but I bet some users taking advantage of oddities of the old system will probably complain soon ;)). Other than that, there are no user-visible enhancements at the moment. However, the internal plumbing has changed dramatically and enables future change. Most importantly, this finally creates a path to a new config language, as we now have a clear interface as part of the in-memory representation of the config, which is config language agnostic.

With this initial release, there may still be some things inside the core that can be optimized. Right now, the system aims at the capability to have multiple config objects loaded (but not active) at the same time. However, there are some data instances where this is not cleanly followed in order to reuse some code. This is not a problem, because the rest of the rsyslog engine does not support dynamic config reload (and thus multiple configs at runtime) at all.

Also it must be noted that the current code is quite experimental. So there is some risk involved in running the initial 6.3.0 version. However, all dramatic changes were made to the config system. That means if the system initializes correctly, it will most probably run without any issues. The risk window is constrained to the initial startup, what should be quite controllable. Users that use privilege drop are advised to check that their configurations work as expected. The previous system did some initialization with full privileges. This is no longer the case, except for modules that actually require full privileges (e.g. imtcp to bind privileged ports). Most importantly, files are now created with dropped privileges right from the beginning. I expect that some (unclean) configurations will run into trouble with that. The good news about that is that the would run into trouble with older releases as well, but only after a HUP. Now things break immediately, what makes them much easier to diagnose.

So what’s next in regard to the config? It depends a bit on the overall workload. I will probably try to have a look at the config language next, which is another non-trivial task. Also past discussions tell me that it is extremely hard to find a format that satisfies all needs. I have already reviewed the last elaborated discussion (June and July 2010 – search for “conf” on these pages) and begun to reconsider some of the options. But this is probably a topic for a separate blog posting…

supported rsyslog versions

Every now and then I am asked which versions the rsyslog project officially supports. I track the current version on the rsyslog status page. To me as a surprise, I was recently asked what that really means. And I have to admit that the page actually does not spell this out in detail. So I thought it may be a good idea to write some small comments on this.


First, what does “officially supported” mean? I think the best definition is that this is the software we expect users to run and we will most probably not look into problems in older versions. At least I will usually not craft patches for older versions. Note that things are a bit different for customers with rsyslog support contracts: here Adiscon as rsyslog’s prime sponsor provides support for all stable versions ever created (as long as this is technically possible, e.g., there are some things which simply can not be fixed in very old versions).


So which versions are officially supported? It is first important to note that for each major version (v4, v5, …) there exist three branches: stable, beta and development. In practice, there usually exists only a beta and development version for the most recent major version and maybe the one before (but we try to avoid that). All other active versions are stable. An older but still useful plot of this concept can be seen in my blog post on the rsyslog family tree. The idea is that new functionality goes into the devel branch, is then slowly migrated to beta and moved to stable state once it gets sufficiently mature (read my blog post on “How Software get Stable” for more on this philosophy).


It is important to note, however, that only the current versions of each branch are officially supported (but note the extension for support contracts I stated above). Today exist far more than 100 versions of rsyslog. It is absolutely impossible to maintain all of them. So we focus on what we expect a careful user should run. For the ultra-conservative folks, we keep older versions, like v4, inside the supported range, but we can (and will) not support any interim version. For example, the some early 4.6.x versions were bugged by some really nasty problems in the TLS arena. This has been fixed in later versions, so why should we still support the known-to-be-faulty versions? This sounds like a big waste of time. Note the difference between, e.g. 4.6.0 and 4.6.5 is just bug fixes (!). So if you follow the “How Software get Stable” spirit, there is absolutely no point in running software with more bugs than necessary and so it sounds like a really bad idea to run those versions. We understand that corporate guys may still have valid reasons to use outdated versions, but that’s covered by the support contracts (think about it this way: this generates a lot of technically unnecessary work, so someone needs to pay for that…).

The story is of course slightly different for version shifts from e.g. 4.4.x to 4.6.x. Note that even number denote stable versions (odd ones are either devel or beta, depending on how they are declared). Here, some new functionality is introduced, and thus there is additional risk for regressions. We try to mitigate this potential by our beta phase, which usually lasts three month. In my opinion, a beta is very close to a stable version. So for this time period, we support something like two stables in order to make sure users can migrate to the new version without risk. Of course, this doesn’t outrule the chance that some regression is left uncaught. But, some risk is always involved in life…

Note that we do not generally provide patches for older stable versions once the next one is out. Even when we patch older versions for support contract customers, we usually patch exactly the problem they see, and not more (what usually is exactly within their minimal-change strategy). So to re-use our 4.x TLS example: the nasty TLS bug was fixed in 4.6.5, which was declared the current stable. So the patch is not available in 4.6.0 to 4.6.4. This also means that no 4.2.x or 4.4.x version will ever get it, because these versions were quite outdated (and unsupported) when the problem was fixed.

If you are still not convinced, just think if this: what would happen if I applied all patches to 4.6.0 that went into 4.6.5? Yes, exactly — it would be the exact same code. This is because 4.6.5 is version 4.6.0 plus all patches ever created for that version ;)

To sum up in one sentence: the project officially supports the head versions of all supported branches (as of the rsyslog status page).

rsyslog config reload – random thoughts

This blog post is more or less a think tank, maybe even an utility to clear my mind. Please note that I am not talking about anything that is right now present in rsyslog. I am not even saying that it will be in the near future. But I’d like to think a bit about alternatives on the route to there.

Let’s assume rsyslog shall have two abilities:

  • use different config languages
  • dynamically reload a config without a full restart (thus applying a delta between new and old config)

In any case, the usual approach is to have an object representing a full configuration. This is an in-memory object. Usually, it is created during parsing the configuration file(s). During that parsing, nothing of the new config is actually carried out, just the in-memory presentation build. In that model, it would also be possible to have several fully populated config objects in core at the same time. The important thing to note is that none of them actually affects the current system – they are just loaded and ready to use.

Usually, in such a design, there is one thing that is called the “running conf”. This configuration is the one the system actually uses for processing.

So how is a new configuration activated? In a first step, the config parsers create an in-memory object. Once this is done, that object is a candidate config, one that could be activated. To actually activate it, the candidate config is loaded as running config. During that process, all the settings are applied and services are started. Please note that a dynamic config reload can be done by first creating a delta between the candidate config and the current running config. This delta can then be used to keep the currently existing config running, but just modify it so that it is equivalent to the candidate config. This process can be less intrusive than shutting down the running config and restarting it based on the candidate config. As an example, a rsyslog system may have several hundered incoming TCP connections open. If the delta is just the addition of a new output file, there is no need to shut down these TCP connections as part of the (delta-driven) candidate config activation. Whereas, if no delta were to be used, all connections would need to be shut down and re-established after restart. There is obviously benefit in delta-based configuration apply. However, it should be noted that there are many subtle issues associated with creating and applying the delta.

Thinking about such a loaded/candidate/running config system for rsyslog, there is one overall issue that complicates things: loadable modules! Each module not only provides extra functionality, it also provides a set of configuration settings to modify its functionality. As such, we have a problem with config parsing: in order to fully parse a config file, we actually need to load the module as part of config processing. Or more precisely, we need to load its configuration file processor. However, it does not make sense to split each module into a config file processor and the actual module, at least I think so. Splitting them up would make things over-complex IMHO. However, we must demand that a module, in its config processing code, must not do anything other then creating configuration objects. Most importantly, it must not start any service or act out any non-config related activity. If it would do so, it would possibly affect the current running configuration. Also, config processing does not necessarily mean that the config will actually be activated, so the module must not assume that its processing will ever be called. As a side-note, this is one of the issues with the current legacy configuration system (as seen in all versions prior to v6 and early v6 versions).

Rsyslog traditionally keeps a list of loaded modules. Modules are added to that list when they are loaded inside the configuration system. During system startup, that very same list is also used to activate services inside the module. So that single list serves both as

  • a registry of loaded modules (e.g. to know what is already available and what needs to be unloaded)
  • a registry of modules required to for configuration

Both functions are tied together into a single list because rsyslog currently has only the concept of a single configuration, and not a candidate/running (multi-)config system. For the latter, it is necessary to differentiate between the two cases:

As we need to load modules during config parsing, we still need a single global list that keeps track of modules already present inside the system. Please note that with a multi-config system, a module that is first “loaded” inside a currently being parsed configuration file may actually already be loaded inside the system. In that case, a duplicate load must be avoided and the already existing module be used. The global, config-independent list is required to support this functionality.

On the other hand, such a global list can no longer be used to activate services for a specific config. This is easy to see when we have a config A which uses e.g. a TCP listener and we have a config B which does not. If B is activated, the TCP listener shall obviously not be activated. As such we need a dedicated, config-specific list of modules that are part of the current configuration. Let’s call this one the “config module list” and the other one the “loaded module list”.

The loaded module list is than just use to keep track of which modules are loaded. Also, it will/can be used to locate a module, so that global functions (like the config parser) can be found and carried out. Note that I call the config parser global, not config-specific. The reason is that the config parser does not take a config instance as input, but rather has no input (other than the config language) but has the config instance as output. As such, it emits config specific data, but does not require it for processing. So it is global.

The config module list in contrast must hold all config specific data elements for the module (most importantly the module specific config instance itself). The config module list is to be used for all config-specific actions. For example, it will be used to activate a module’s services when the candidate config becomes a running config (maybe via a delta-apply process).

Note that on-demand module unload can be done via reference counting, which is already implemented in rsyslog. When a module is put onto a config module list, the count is incremented. If it is removed from such a list (usually because the in-memory config is destroyed), the count is decremented. An unload happens if the reference count reaches zero. If the module would be required by later processing another config, that would trigger a reload just as if the module had never before been loaded.

Note that a clearly defined and implemented split in global vs. configuration specific functionality is of vital importance for a multi-config system. This probably has some subtle issues as well. Right out of my head, I can think of the problem of some potentially global configuration settings (a term that seems to contradict itself in this context – just think a bit about it…). For example, we have the module search path, which tells us from where to load modules. With different configs, we can potentially have different module search pathes. That, in turn, can load to modules with the same name being loaded from different locations. That means we could potentially have different functionality, including different sets of config parameters (!) in the system at the same time. This could lead to some hard to diagnose issues. So it looks necessary to have the ability to load the same module via different pathes concurrently, and apply only the “right” module to the config in question. Looking at the current code base, implementing this would be even harder than just splitting out the global/config-specific lists. Maybe this would be something that shall be added at a later stage, *if* at all we take the path down that road. Also, there may be other issues along the way that I do not currently envision…

message classification with liblognorm sample code

I have just enhanced liblognorm‘s normalizer tool to support the -t option. If it is given, only messages with the specified tag will be output. Currently, only a single tag can be specified. The main purpose of this change is to provide some example code on how to use the message classification API, so that other developers can include it into their solutions more easily.

In essence, the whole logic is contained in normalizer.c, line 122 and 123. The application needs to keep the “wanted” tags inside an es_str_t type. Then, it needs to call the ee_getEventField() API to find out if the normalizer (better said: its rule) associated the tag with a given message. That’s it…

Please note that we may implement a more powerful API in the future — if this makes sense. If you think API additions would be useful, please suggest them together with a description of the benefits.

log classification with liblognorm

Today, I have added support for so-called “tags” to liblognorm (and it’s base library libee). This new capabilities permits very easy classification of syslog message and log records in general. So you can not only extract data from your various log source, you can also classify events, for example, as being a “login”, a “logout” or a firewall “denied access”. This makes it very easy to look at specific subsets of messages and process them in ways specific to the information being conveyed.

To see how it works, let’s first define what a tag is: A tag is a simple alphanumeric string that identifies a specific type of object, action, status, etc. For example, we can have object tags for firewalls and servers. For simplicity, let’s call them “firewall” and “server”. Then, we can have action tags like “login”, “logout” and “connectionOpen”. Status tags could include “success” or “fail”, among others. The idea of tags is based on early CEE concepts. I will try to keep consistent with whatever CEE heads to. Tags form a flat space, there is no inherent relationship between then (but this may be added later on top of the current implementation). Think of tags like the tag cloud in a blogging system. Tags can be defined for any reason and need (though obviously we must strive to get to a standard set, something I hope CEE will provide in the not too distant future). A single event can be associated with as many tags as required.

Assigning tags to messages is simple. A rule contains both the sample of the message (including the extracted fields) as well as -now- the tags. Have a look at this sample, taken from liblognorm 0.2.0:

rule=:sshd[%pid:number%]: Invalid user %user:word% from %src-ip:ipv4%

Here, we have a rule that shows an invalid ssh login request. The various field are used to extract information into a well-defined structure. Have you ever wondered why every rule starts with a colon? Now, here is the answer: the colon separates the tag part from the actual sample part. Starting with liblognorm 0.3.0, you can create a rule like this:

rule=ssh,user,login,fail:sshd[%pid:number%]: Invalid user %user:word% from %src-ip:ipv4%

Note the “ssh,user,login,fail” part in front of the colon. These are the four tags the user has decided to assign to this event. What now happens is that the normalizer does not only extract the information from the message if it finds a match, but it also adds the tags as metadata. Once normalization is done, one can not only query the individual fields, but also query if a specific tag is associated with this event. For example, to find all ssh-related events (provided the rules are built that way), you can normalize a large log and select only that subset of the normalized log that contains the tag “ssh”.

Note that versions of liblognorm 0.2.0 simply ignore the tag part, so old versions of the library are capable of working with new rule bases.

This is pretty cool and has ample potential. Just think about creating firewall reports: if you have different firewalls, you only need to have different rule bases to normalize these events all into the same format. Even more now, you can process the logs based on the classification assigned during the normalization process. For example, a “failed connection request” report may ignore everything that is not tagged as “connection, fail”.

That probably sounds pretty good to you, but how to actually use it? Right now, the core functionality is available inside the libraries (more precisely in the git version, I will do an official release very soon but wanted to spread word). That means developers have the necessary API to integrate with their programs. End user tools do not yet exist (what is not too surprisingly for a library). Integration of the new functionality is very easy. Classification is available without need to change anything in existing applications. A single new simple API ee_EventHasTag() has been added, which needs to be called to see if an event is associated with the given tag. [side-note: the current API is NOT guaranteed to be stable, even though I try not to break things without need]

In hope that developers will play with the new functionality, so that it will be available in end-user tools soon as well. I myself plan to enhance the normalizer tool very soon to support selecting subsets based on tags (this can also serve as an example for other developers). Also, I plan to add classification support to rsyslog very, very soon. So stay tuned to what’s coming up — it’s exciting ;)