how to use rsyslog’s ruleset and call statements

Rsyslog 7.2+ introduced a couple of cool config enhancements, among them a new way to specify rulesets and to call into a ruleset (a much better replacement for omruleset). Unfortunatley, the doc is currently extremely sparse. That will change soon, but in the mean time I thought I provide at least some clues here via the blog.

In essence, the ruleset statement permits to specify a ruleset. It’s written as

ruleset(name="rulesetname") { statements here }

Rulesets can be bound to inputs, as usual, and any ruleset can call into another ruleset via the call statement:

call rulesetname

Note that in call, the rulesetname is just plainly specified. We created it that way as we thought this looks most intuitively to people used to any kind of scripting or programming language.

A somewhat larger sample (bascially taken from the rsyslog mailing list, thanks to Brian Knox) is:

module(load="imptcp" keepalive="on")
# use imptcp just as example for bind ruleset below
ruleset(name="rs1") {

        *.* /var/log/test1.log
}
ruleset(name="rs2") {
        *.* /var/log/test2.log
        call rs1
}
input(type="imptcp" port="13514" ruleset="rs2")

All statements NOT specified inside a ruleset become part of the default ruleset. Legacy-style $Ruleset statements are still supported, but cannot be used inside a ruleset() statement (that would create a big mess and is totally unnecessary).

new v7/v6 stable, v5 now legacy

Today is a very important day for me – and rsyslog. After a very intense work period, the first v7 version is finally ready for the “stable” label. It contains a lot of new and important functionality. IMHO v7 is the most important release since the introduction of v4, and the next 7.3+ versions will bring even more very important enhancements. We ironed out quite some bugs during the beta phase. Interestingly, some of the bug fixes stem back to as far as v5.10, which recently received quite some attention.

As v7-beta is based on v6-beta, we can also release a new v6-stable, because v6 is a proper subset and all relevant patches have been applied to v6 as well. So they receive kind of “transitive readiness” via v7. It is important to note that we developed concepts in v6, but usually published only the merged-up version in v7 (in order to save development effort). As such, the release count for v6 (6.5.1) doesn’t look like it is a version ready for the “stable” label, but in fact the release count is artificially low due to the way we worked on the v6/v7 tandem.

Let me stress again that you usually want to go for v7 to see the latest and greatest in logging. Version 6 is basically a v5 engine, somewhat performance enhanced and with experimental support for structured logging/lumberjack logging. If you want to use structured logging in production, v7 is highly recommended. For example v6 does NOT support disk-queues in combination with structured logging, whereas v7 of course does.

With all these good news, v5 is now legacy and will no longer be officially supported by the project. Full support of course is still available as part of Adiscon’s rsyslog support packages. In regard to its wide use in distributions, we will still collaborate with the distro maintainers to solve problematic bugs (as we always did in the past for officially non-supported versions). However, this still means we need to care much less about v5, and that in turn means we can free many development ressources tied to maintaining old stuff and merging all those changes to newer versions. This has been very time-consuming in the past and I am very relieved that we currently have a plain v6 stable, v7 stable and an active development branch taking place in v7, only. I hope this will remain the case (except for some occasional v7-betas) at least for the next couple of month. It would definitely boost my productivity.

Todays releases are the culmination of a group effort. I would like to express special thanks to Miloslav Trmač, Milan Bartos, David Lang and Michael Biebl for their good suggestions, motivation and code contributions. I also thank the many other contributors (just have a look at the ChangeLog or git history!). The project would have been unable to reach this important milestone without all your contributions, including the well-done bug reports! Thanks a lot, folks!

With that said, I hope everyone enjoys the new stable versions. Also, look forward to an exciting future, which already has begun with the 7.3 series, which contains some good performance improvements. We continue to work on many cool new features!

new “stable” policy for rsyslog?

For many years, I have strongly opted for the devel-beta-release cycle for rsyslog, where a transition from beta to release only happens after two to three month in order to be sure that a stable version really was stable. This worked well, because early adopters of the beta branches helped to iron out any remaining bugs. Unfortunately, I see this the early adoption rate decline for around two years now (actually starting with v5). The reason is probably obvious: rsyslog has become so feature-rich that users usually do not see any need to work with the beta versions but rather wait for stable. The bottom line is that when everybody waits for stable, the beta won’t stabilize at all ;-).

Unfortunately, I have seen this, too, happen in practice: even though we had very lengthy beta periods, problem reports just came in after the beta has been declared to be stable. At that time, many folks seem to start using the system, and as nobody did any testing before, things then begin to break. I guess there is noone to blame, that’s just how things work. However, one needs to know that I invest quite some time into the devel-beta-stable cycle. Time, that is most probably better spent at other things, if looked at in the light of what I just described.

As such, I will most probably change our “stable” policy for rsyslog soon: the tag “stable” will no longer mean that the product has actually stabilized, but that we are ready to support it as a stable version (probably very important to those folks with support contracts). While this may sound harsh, it has a lot of good: first of all, this is what actually happens for 2+ years already, just with a 3-month delay. So we spare these months, what means new features will become more readily available, and bugs are probably quicker found AND fixed (because if a bug reports is closer to implementation, it is usually easier to figure out what is wrong). With that said, rsyslog 7.2 will probably the last stable that went through the usual cycle.

If you have a great idea how we can actually get more beta deployments, I’ll reconsider that decision.

Note that we will still have devel-beta-stable branches, but beta will much quicker turn into stable than before.

Main Advantages of rsyslog v7 vs. v5

A lot of work has been done since the days of rsyslog v5. In this  post, I will provide the top 5 advantages of the v7 engine over the previous version. Please note that I do not talk about v6, as it is very close to the v5 engine, just with the improved config language. V6 also provides some experimental support for structured logging (project lumberjack), which has been fully matured in v7 (I strongly recommend to use v7 if you are serious about structured logging).

So, here are the main points:

  • greatly improved configuration language – the new language is much more intuitive than the legacy format. It will also prevent some typical mistakes simply be not permitting these invalid constructs. Note that legacy format is still fully supported (and you can of course do the same mistakes as before if you use legacy format).
  • greatly improved execution engine – with nested if/then/else constructs as well as the capability to modify variables during processing.
  • full support for structured logging and project lumberjack / CEE – this includes everything from being able to create, interpret and handle JSON-based structured log messages, including the ability to normalize legacy text log messages.
  • more plugins – like support for MongoDB, HDFS, and ElasticSearch as well as for the kernel’s new structured logging system.
  • higher performance – many optimizations all over the code, like 5 to 10 times faster execution time for script-based filters, enhanced multithreaded TCP input plugin, DNS cache and many more.

Of course, there are many more improvements. This list contains just the most important ones. For full details, check the file ChangeLog.

rsyslog string array comparisons

In order to aid some string comparisons, I have just introduced constant string arrays in rsyslog. Look at this construct:

if $msg contains [“event12”, “event15″,”event19”] then
    /home/rgerhards/logfile

This will match if $msg contains any of the three strings given inside the array, so it works like an implicit “or”. I have seen this in a couple of customer configs and I am confident that this will ease rsyslog configuration. In later releases, it will also provide room for performance optimizations.

This feature will be part of 7.1.7.

[edit] Starting with rsyslog 7.3.6, string array comparisons are notably faster, especially if the array is large. At most, there are log(n) comparisons required (where n is the array size).

rsyslog’s new input() statement: a quick look

I have just completed the new input() config statement. This concludes the major part of the new config format for v6 (v7 will also support an enhanced ruleset() statement). I wanted to give you some quick ideas of how the new format looks in practice. So I have created a small test rsyslog.conf with the old-style directives commented out and followed by the new style ones. I guess it is pretty self-explanatory. Here it is:

#$ModLoad imfile
#$inputfilepollinterval 1
module(load=”imfile” pollingInterval=”1″)

#input(type=”imuxsock” )
module(load=”imuxsock” syssock.use=”off”)
input(type=”imuxsock” socket=”/home/rgerhards/testsock”)

$ModLoad imfile
module(load=”imfile”)
#$InputFileName /tmp/inputfile
#$InputFileTag tag1:
#$InputFileStateFile inputfile-state
#$InputRunFileMonitor
input(type=”imfile” file=”/tmp/inputfile” tag=”tag1:” statefile=”inputfile-state”)

module(load=”imptcp”)
#$InputPTCPServerRun 13514
input(type=”imptcp” port=”13514″)

module(load=”imtcp” keepalive=”on”)
#$InputTCPServerSupportOctetCountedFraming off
#$InputTCPServerInputName tcpname
#$InputTCPServerRun 13515
input(type=”imtcp” port=”13515″ name=”tcpname” supportOctetCountedFraming=”off”)

#$UDPServerRun 13514
#$UDPServerRun 13515
input(type=”imudp” port=”13514″)
input(type=”imudp” port=”13515″)

rsyslog plugin for structured Linux kernel logs

Thanks to Milan Bartos, there now is imkmsg, which supports structured kernel logging. This module will become available as part of rsyslog 7.1.5 later this week. The module was created with the help of Red Hat as part of the lumberjack effort for a better and structured logging infrastructure.

Note that the rsyslog v7 engine supports numerous enhancements for simple processing of lumberjack strucutred logs. So imkmsg is a natural fit for that version. As Milan told me, there is still room for improvement inside the module and some nits may exist and be fixed within the next weeks. However, we would appreciate feedback from early adoptors.

setting variables in rsyslog v7

Starting with rsyslog 7.1.3, variables can be manipulated inside rsyslog.conf. Note that the native message properties (like $msg) cannot be modified, but the CEE/lumberjack-type properties (those $!name variables) can. As variable modification is primarily an aid for template generation and modification, we do not consider this a restriction, as native message properties can simply be copied to the CEE tree if this is needed.

Note that CEE/lumberjack properties, as implemented in rsyslog,  can be hierarchical and levels are delimited by the bang sign (based on lumberjack recommendations). So “!uid” is the uid field in the CEE root, whereas “!usr!uid” is the uid field inside the usr container. Nesting can be as deep as desired. Currently, all parts of the CEE tree can be accessed. In later versions, this may require the setting of a global option. If that will happen depends on the feedback we receive.

So: how does it work?

To set a variable, simply use
set varname = expression;
Where expression can be an arbitrary complex expression, just like in an “if” statement. Note the semicolon at the end: this is unfortunately necessary and a diversion from the other config statements. However, this is the price we need to pay to remain upward compatible with the old style config format.

Concrete examples:

set $!usr!level2!var1 = “test”; 
set $!usr!level2!var1 = $msg; # update variable with native MSG field
set $!usr!level2!var2 = 1+1; # set !usr!level2!var2 = 2
set $!usr!level2 = $fromhost; # invalid

The last example is invalid and will not be executed because it tries to replace a complete container (!usr!level2) with a single value. This is rejected and not executed. Note that this problem is not detected during config read time but rather during execution time (in less trivial samples, it can not be reliable detected without execution).

Note that string concatenation is currently NOT supported, but it will be soon in the next releases. Also, full JSON containers cannot yet be copied. If this is tried to, the resulting variable will receive a string representation, which most probably is not what you wanted (and you will get a different result in future versions).

There is also an accompanying “unset” statement to remove a variable that is no longer needed. This is primarily meant to restructure a CEE container. It’s syntax simply is:

unset varname;

Again, note the semicolon at the end. A concrete example is

unset !usr!level2!var1;

which removes a single element. But full containers can also be removed:

unset !usr!level2;

Note that all variables are assigned to the message currently being processed. There currently is no way to set global variables (but this may become available based on feedback I see).

BSD-Style blocks will go away in rsyslog v7

Rsyslog supports BSD-style blocks since ages. They were a pretty handy tool to group actions together that should act only on remote hosts or log messages from specific programs. However, the v7 config system with its full nesting capabilities provides a much better – and easy to use – way to specify this. If both systems are mixed, the problem is that BSD-style blocks can be used to violate the nesting structure (examples below). Also, support for them adds a lot to rule engine code complexity. And finally, they are also very seldom used, few users even know they exist.

As a result, I have decided to drop support for BSD-style blocks in rsyslog v7 and above. A poll on the mailing list a week ago did not make anybody speak up against that change. So I assume none is hurt. This is especially the case as the conversion of BSD-style blocks to nested config is a very easy one.

Let’s look at this example:

!prog1
*.* /var/log/prog1.log
*.* /var/log/prog1again.log
!prog2
*.* /var/log/prog2.log
*.* /var/log/prog2again.log
This code can very simply be replaced by:
 if $programname == ‘prog1’ then {
   /var/log/prog1.log
   /var/log/prog1again.log
}
if $programname == ‘prog2’ then {
   /var/log/prog2.log
   /var/log/prog2again.log
}
And if you prefer the more powerful action statments (probably not so much benefit for this use case…), you can write:
 if $programname == ‘prog1’ then {
   action(type=”omfile” file=”/var/log/prog1.log”)
   action(type=”omfile” file=”/var/log/prog1again.log”)
}
if $programname == ‘prog2’ then {
   action(type=”omfile” file=”/var/log/prog2.log”)
   action(type=”omfile” file=”/var/log/prog2again.log”)
}
I expect that usually these easy cases happen. HOWEVER, if I had kept support for BSD-style blocks, one could configure things like this:
!prog1
if $msg contains ‘test’ then  {
                action(type=”omfile” file=”/var/log/somefile”)
                !prog2

mail.* :mmjsonparse:

                               & action(type=”omfile” file=”/var/log/somefile2″)
               !prog3

 

                               & ~
                !prog4
                if $msg contains ‘test2’ then
                               /var/log/logfile
                else
                               /var/log/logfile2
}
Can you make out the actual nesting structure of this config? When, for example, programname needs to be “prog3” and what happens then? IMHO this combination can cause considerable user confusion and frustration. As such, I made a sharp cut and removed it.
My apologies for those that need to do the manual conversion. I am sure the time is well-invested in the long term.

creating an error log file for elasticsearch data errors

With the work I have just completed in the beta-jsonrt branch of rsyslog, omelasticsearch now supports logging records with data errors to a local log file. This, for example, handles cases where the data type does not match an expected case. These type of errors are not recoverable, so omelasticsearch does not busy the rsyslog core with handling them – especially as it knows precisely which records caused the error. Instead, as of now (will be “officially” available as part of  7.3.2) the user can configure a local error log file, where all requests with errors are written to. That file will receive information about the request and elasticsearch reply. Log records are formatted in JSON and as such are easy to handle. Note that the request will be reported as the post data, a string, NOT a full JSON representation. This is done for two reasons: first of all, the post data for a bulk mode request (as defined by ES) is NOT proper JSON, so we would need to mangle with that data – something we don’t like, as it may hide the real request problem (better use “the real thing”). Secondly (and closely related), there may actually be errors in that string, so that we may not even be able to parse it. As such, it makes a lot more sense to supply the actual string. The reply part is JSON representation as we received it from ES (we assume that it could be parsed – else something would be awfully wrong). Note that in bulk mode, the error record may also contain records that have been processed correctly (ES status OK). Again, we include them so that the whole request/reply process can be evaluated. If data needs to be re-processed, users scripts must ignore such OK status records.

The core idea behind that representation is that it should be as simple as possible to create a custom cleanup script capable of processing the error information – while still providing full feedback on the error cause.

Once the error file is opened, it is never closed until rsyslog is terminated OR a HUP is sent. So proper processing of the error file in a custom script is to first rename the file and then send SIGHUP to rsyslog. This will cause it to release the file and re-open it *when needed*. Note that the file is only created if actually errors occurred. Among others, this is done so that the presence of this file can be used as simple trigger to decide if error handling must be started or not (should be easy to integrate with custom monitoring solutions).

The error file functionality is activated by specifying the “errorfile” action parameter. If it is not given, errors are simply ignored (as there is nothing we can do to handle them in any case).

A quick sample on how this can be done is here:

*.*     action(type=”omelasticsearch”
searchIndex=”testindex”
searchType=”mytype” bulkmode=”on”
errorFile=”/var/log/rsyslog/ES-error.log”)

Error file contents will look like this:
{
“request”:    {
“url”:    “http://localhost:9200/_bulk?replication=async&timeout=1m&”,
“postdata”:    “{“index”:{“_index”: “testindex”,”_type”:”mytype”}}n{“message”:” test 1″,”fromhost”:”hostname”,”facility”:”kern”,”priority”:”notice”,”timereported”:”2012-07-30T12:38:38.737354+02:00″,”timegenerated”:”2012-07-30T12:38:38.737354+02:00″}n”
},
“reply”:    {
“took”:    2,
“items”:    [{
“create”:    {
“_index”:    “testindex”,
“_type”:    “mytype”,
“_id”:    “g8XJzt-TRd6WOu3pRyzrcg”,
“error”:    “MapperParsingException[Failed to parse [timegenerated]]; nested: NumberFormatException[For input string: “2012-07-30T12:38:38.737354+02:00″]; ”
}
}]
}
}