simplifying rsyslog JSON generation

With RESTful APIs, like for example ElasticSearch, you need to generate JSON strings. Rsyslog will soon do this in a very easy to use way. The current method is not hard either, but often looks a bit clumsy. The new way of doing things will most probably be part of the 8.33 release.

You now can define a template as follows:

template(name="outfmt" type="list" option.jsonf="on") {
property(outname="@timestamp"
name="timereported"
dateFormat="rfc3339" format="jsonf")
property(outname="host"
name="hostname" format="jsonf")
property(outname="severity"
name="syslogseverity-text" caseConversion="upper" format="jsonf")
property(outname="facility"
name="syslogfacility-text" format="jsonf")
property(outname="syslog-tag"
name="syslogtag" format="jsonf")
property(outname="source"
name="app-name" format="jsonf")
property(outname="message"
name="msg" format="jsonf")

}

This will generate JSON. Here is a pretty-printed version of the generated output:

{
"@timestamp": "2018-03-01T01:00:00+00:00",
"host": "172.20.245.8",
"severity": "DEBUG",
"facility": "local4",
"syslog-tag": "app[1666]",
"source": "app",
"message": " this is my syslog message"
}

Note: the actual output will be compact on a single “line”, as this is most useful with RESTful APIs.

Future versions of rsyslog may see additional simplifications in generating the JSON. For example, I currently think about removing the need to give format=”jsonf” for each property.
The functionality described here is being added via this pull request.

creating an error log file for elasticsearch data errors

With the work I have just completed in the beta-jsonrt branch of rsyslog, omelasticsearch now supports logging records with data errors to a local log file. This, for example, handles cases where the data type does not match an expected case. These type of errors are not recoverable, so omelasticsearch does not busy the rsyslog core with handling them – especially as it knows precisely which records caused the error. Instead, as of now (will be “officially” available as part of  7.3.2) the user can configure a local error log file, where all requests with errors are written to. That file will receive information about the request and elasticsearch reply. Log records are formatted in JSON and as such are easy to handle. Note that the request will be reported as the post data, a string, NOT a full JSON representation. This is done for two reasons: first of all, the post data for a bulk mode request (as defined by ES) is NOT proper JSON, so we would need to mangle with that data – something we don’t like, as it may hide the real request problem (better use “the real thing”). Secondly (and closely related), there may actually be errors in that string, so that we may not even be able to parse it. As such, it makes a lot more sense to supply the actual string. The reply part is JSON representation as we received it from ES (we assume that it could be parsed – else something would be awfully wrong). Note that in bulk mode, the error record may also contain records that have been processed correctly (ES status OK). Again, we include them so that the whole request/reply process can be evaluated. If data needs to be re-processed, users scripts must ignore such OK status records.

The core idea behind that representation is that it should be as simple as possible to create a custom cleanup script capable of processing the error information – while still providing full feedback on the error cause.

Once the error file is opened, it is never closed until rsyslog is terminated OR a HUP is sent. So proper processing of the error file in a custom script is to first rename the file and then send SIGHUP to rsyslog. This will cause it to release the file and re-open it *when needed*. Note that the file is only created if actually errors occurred. Among others, this is done so that the presence of this file can be used as simple trigger to decide if error handling must be started or not (should be easy to integrate with custom monitoring solutions).

The error file functionality is activated by specifying the “errorfile” action parameter. If it is not given, errors are simply ignored (as there is nothing we can do to handle them in any case).

A quick sample on how this can be done is here:

*.*     action(type=”omelasticsearch”
searchIndex=”testindex”
searchType=”mytype” bulkmode=”on”
errorFile=”/var/log/rsyslog/ES-error.log”)

Error file contents will look like this:
{
“request”:    {
“url”:    “http://localhost:9200/_bulk?replication=async&timeout=1m&”,
“postdata”:    “{“index”:{“_index”: “testindex”,”_type”:”mytype”}}n{“message”:” test 1″,”fromhost”:”hostname”,”facility”:”kern”,”priority”:”notice”,”timereported”:”2012-07-30T12:38:38.737354+02:00″,”timegenerated”:”2012-07-30T12:38:38.737354+02:00″}n”
},
“reply”:    {
“took”:    2,
“items”:    [{
“create”:    {
“_index”:    “testindex”,
“_type”:    “mytype”,
“_id”:    “g8XJzt-TRd6WOu3pRyzrcg”,
“error”:    “MapperParsingException[Failed to parse [timegenerated]]; nested: NumberFormatException[For input string: “2012-07-30T12:38:38.737354+02:00″]; ”
}
}]
}
}

Using ElasticSearch Bulk Mode with rsyslog

Rsyslog’s omelasticsearch plugin now supports bulk mode. With bulk mode, message processing is much faster, especially if large loads are to be processed.

Bulk mode works with rsyslog’s batching capabilities. So it probably is a good idea to refresh some of the batching concepts. The core idea is that while we would like to process many messages at once, we do NOT want to wait hold processing messages “just” because they are too few. So with batching, you set an upper limit on the batch size (number of messages inside a batch). Let’s say the batch size is set to 32. When a new batch is to be processed, the queue worker tries to pull 32 messages off the queue. If there are 32 or more present, this is nice and all 32 are taken from the queue. But now let’s assume there are only 10 messages at all present inside the queue. In that case, the queue worker does not try to guess when the next 22 messages will arrive and wait for that (if the time is short enough). Instead, it just pulls the 10 already-present messages off the queue and these form the batch. When new messages arrive, they will be part of the next batch.

Now let’s look at the startup of a busy system. Lot’s of messages come in. Let’s assume they are submitted one-by-one (many sources submit multiple messages, but let’s focus on those cases that do not). If so, the first message is submitted and the queue worker is activated. Assuming this happens immediately and before any other message is submitted (actually unlikely!), it will initially create a batch of exactly one message and process that. In the mean time, more messages arrive and the queue fills. So when the first batch is completed, there are ample messages inside the queue. As such, the queue worker will pull the next set of 32 messages off the queue and form a new batch out of them. This continues as long as there are sufficient messages. Note that in practice the first batch will usually be larger than one and often be the max batch size, thanks to various inner workings I would not like to elaborate on in detail in this article. Large batch sizes with more than 1024 messages are not bad at all and may even help improve performance. When looking at a system performing with such large batches, you will likely see that partial batches are being created, simply for the reason that the queue does not contain more messages. This is not an indicator for a problem but shows that everything works perfectly!

The max batch size can be configured via

$ActionQueueDequeueBatchSize
and
$MainMsgQueueDequeueBatchSize

Note that the default sizes are very conservative (read: low), so you probably want to adjust them to some higher value. The best value depends on your workload, but 256 is probably a good starting point. If the action queue runs asynchronously (e.g. linkedlist mode, everything non-direct), the action queue batch size specifies the upper limit for the elasticsearch bulk submission.

To activate bulk mode, use

*.*     action(type=”omelasticsearch”
           … other params …
           bulkmode=”on”)

The default is the more conservative “off”. Note that the action can of course be used with any type of filter, not just the catch-all “*.*”. This is only used as a sample.

rsyslog templates & json

I today added a simpler method to specify JSON inside rsyslog templates. The new method simplifies specifying JSON-encoded fields. It still looks a bit ugly, but if you look closely enough, you’ll quickly notice that it no longer needs “quoting magic” and thus is far easier to work with.

Previously, you had to define a template like this:

$template tpl, “{“message”:”%msg:::json%”,”fromhost”:”%HOSTNAME:::json%”,”facility”:”%syslogfacility-text%”,”priority”:”%syslogpriority-text%”,”timereported”:”%timereported:::date-rfc3339%”,”timegenerated”:”%timegenerated:::date-rfc3339%”}”

The template given above is the default for ElasticSearch. With the new code, this can be replaced by:

$template tpl,”{%msg:::jsonf:message%,%HOSTNAME:::jsonf:fromhost%,%syslogfacility-text:::jsonf:facility%,%syslogpriority-text:::jsonf:priority%,%timereported:::date-rfc3339,jsonf%,%timegenerated:::date-rfc3339,jsonf%}”

It’s a bit shorter, but most importantly the JSON field is now generated by the property itself. This is triggered by the “jsonf” (json field) option. If it is given, a

“fieldname”=”value”

is automatically generated. The fieldname is either the forth parameter (see “message” in the msg field in the example above) or, if not given, defaults to the property name. If I hadn’t insisted on specific field names, I could have written the sample as such:

$template tpl,”{%msg:::json%,%HOSTNAME:::jsonf%,%syslogfacility-text:::jsonf%,%syslogpriority-text:::jsonf%,%timereported:::date-rfc3339,jsonf%,%timegenerated:::date-rfc3339,jsonf%}”

Note that the commas to separate the various JSON fields must still be included inside the template as literal text.

This new system works pretty well within the current template system. The config format will probably become a bit more intuitive when moving it over to the new config language. But the exact how’s of that are still to be decided. In fact, I think this string-based way of specifying templates is not so bad. In any case, I am also considering a template option which would take a template of pure field lists and generate proper JSON out of that. But that’s work for another day (or so ;)).

This is to be released with 6.3.9 and already available via the git master branch.

rsyslog & elasticsearch: async replication and timeout

Today I have added the capability to specify asynchronous replication support and timeout settings to omelasticsearch. Code-wise it’s a small change – it just passed the required parameters to ElasticSearch via the proper REST URL parameters. By default, both parameters are not set, that means default timeout and synchronous replication.

To set parameters, use

*.*     action(type=”omelasticsearch”
           … other params …
           asyncrepl=”on” timeout=”1m”)

If you leave “asyncrepl” out or set it to “off”, synchronous replication is used. For greatest flexiblity, the value of the “timeout” parameter is not checked. While this enables you to use anything ElasticSearch supports, invalid values can not be detected by omelasticsearch and thus will cause all inserts to fail (or misbehave). Note that some basic integrity checking is done, but we do not go great length here. So use with care.

rsyslog & ElasticSearch: using dynamic index and type

Today, I have added support for dynamically taking search index and type from the message itself. It works much like dynafiles do. If dynamic mode is selected (via config params), the index name and type given in the configuration are template names instead of  literal ones. That means the rsyslog core will generate the actual name at runtime, based on actual message content. These dynamically generated values will then be used inside the request.

Here is a quick howto:  you need to define the templates first, like this:

$template srchidx,”%hostname%”
$template srchtype,”%programname%”

In this example, the hostname (from the message) is used as name for the search index, and the programname (usually part of the syslog tag) is used as searchtype. The full power of templates, including all properties, can be used, so this is highly flexible.

To actually use these dynamic values, omelasticsearch must be called like this (line break do not matter):

*.*     action(type=”omelasticsearch”
           searchIndex=”srchidx” dynSearchIndex=”on”
           searchType=”srchtype” dynSearchType=”on”)

The “dynSearch…” setting tells the engine if it is a dynamic name or not. If set to on, like above, a template name is given and the actual name will be dynamically generated. If set to “off”, the literal value will be used (“off” is the default, so the parameters do not need to be specified if “off” is desired). For example:

*.*     action(type=”omelasticsearch”
           searchIndex=”srchidx”   searchType=”srchtype”)

will use the index “srchidx” and type “srchtype” literally. No template-based name extension will happen in this case. Please note that it is ok for only one parameter to by dynamic. It’s not “all or nothing”.

I hope this is a useful addition for omelasticsearch. Several folks have indicated that this method can increase throughput in a couple of use cases, for example to split of indexes based on host names, host functions and the like.

Using different templates with rsyslog’s ElasticSearch plugin

Recently, an experimental ElasticSearch plugin has been added to rsyslog, omelasticsearch. Like all other output plugins, it comes with a canned template, which specifies a default “schema”. However, the template engine provides capabilities to use a completely different set of fields. In this blog post, I’ll briefly describe how this is done.

Note: all work is based on the current (April, 10th 2012) implementation of both omelasticsearch as well as the rsyslog core. Future implementation changes are expected in an effort to make things more intuitive. This may even break what I describe here. So if you come to this blog post at a later time, it probably is best to check if things have changed by “now” (especially if the procedure does not work ;)).

The current implementation ties into the template system. The default template looks as follows (all on one line, broken for readability):

$template JSONDefault, “{“message”:”%msg:::json%”,”fromhost”:”%HOSTNAME:::json%”,”facility”:”%syslogfacility-text%”,”priority”:”%syslogpriority-text%”,”timereported”:”%timereported:::date-rfc3339%”,”timegenerated”:”%timegenerated:::date-rfc3339%”}”

The ‘”‘ sequence is needed to represent a quote character inside the template. To format JSON, this is pretty ugly, but that’s the way the template processor currently works (and that is one reason why it is under review). As you can see, the JSON is actually “hand-crafted”, with the “json” option specifying that property text needs to be properly escaped to be well-formed JSON. If that option is specified, the template processor does the necessary escaping. Note that not all properties have the “json” option. This is purely for performance reasons. For example, time stamps do never include characters that need to be escaped. Consequently, the “json” option is not used there (but could be, e.g. “date-rfc3339,json”).

So now let’s define a different set of fields to be used. Let’s say we just want to have the date from the syslog message and the MSG part of the message itself. That would be as follows:

$template miniSchema, “{“message”:”%msg:::json%”,”timereported”:”%timereported:::date-rfc3339,json%”}”

Note: I requested JSON formatting in this example just to prove the point – don’t use it in a real deployment, as it is nonsense that costs CPU cycles ;)

Now I need to use the template within the omelasticsearch action. This is done as follows:

*.*     action(type=”omelasticsearch” template=”miniSchema”)

Note: the “all” (“*.*”) filter of course can be replaced with a different type of filter.

That’s all that needs to be done. Please note that you can add several templates and use these in several different elasticsearch output actions. Just exactly the same thing used in other actions. Also keep in mind that the property replacer permits to access a wide range of message properties. Most importantly, normalized properties or cee-enhanced syslog properties can be accessed via the CEE family of property names (essentially “$!cee-name” style). Just be sure that you include the “json” option into any property that may contain unsafe characters (which means almost all of the fields). This is not done automatically by the current engine and invalid characters can lead to strange problems, even aborts of ElasticSearch itself!

In the future, a more intuitive syntax is planned for JSON template definitions. Nevertheless, the current code permits full customization but requires taking care of the details.