creating an error log file for elasticsearch data errors

With the work I have just completed in the beta-jsonrt branch of rsyslog, omelasticsearch now supports logging records with data errors to a local log file. This, for example, handles cases where the data type does not match an expected case. These type of errors are not recoverable, so omelasticsearch does not busy the rsyslog core with handling them – especially as it knows precisely which records caused the error. Instead, as of now (will be “officially” available as part of  7.3.2) the user can configure a local error log file, where all requests with errors are written to. That file will receive information about the request and elasticsearch reply. Log records are formatted in JSON and as such are easy to handle. Note that the request will be reported as the post data, a string, NOT a full JSON representation. This is done for two reasons: first of all, the post data for a bulk mode request (as defined by ES) is NOT proper JSON, so we would need to mangle with that data – something we don’t like, as it may hide the real request problem (better use “the real thing”). Secondly (and closely related), there may actually be errors in that string, so that we may not even be able to parse it. As such, it makes a lot more sense to supply the actual string. The reply part is JSON representation as we received it from ES (we assume that it could be parsed – else something would be awfully wrong). Note that in bulk mode, the error record may also contain records that have been processed correctly (ES status OK). Again, we include them so that the whole request/reply process can be evaluated. If data needs to be re-processed, users scripts must ignore such OK status records.

The core idea behind that representation is that it should be as simple as possible to create a custom cleanup script capable of processing the error information – while still providing full feedback on the error cause.

Once the error file is opened, it is never closed until rsyslog is terminated OR a HUP is sent. So proper processing of the error file in a custom script is to first rename the file and then send SIGHUP to rsyslog. This will cause it to release the file and re-open it *when needed*. Note that the file is only created if actually errors occurred. Among others, this is done so that the presence of this file can be used as simple trigger to decide if error handling must be started or not (should be easy to integrate with custom monitoring solutions).

The error file functionality is activated by specifying the “errorfile” action parameter. If it is not given, errors are simply ignored (as there is nothing we can do to handle them in any case).

A quick sample on how this can be done is here:

*.*     action(type=”omelasticsearch”
searchType=”mytype” bulkmode=”on”

Error file contents will look like this:
“request”:    {
“url”:    “http://localhost:9200/_bulk?replication=async&timeout=1m&”,
“postdata”:    “{“index”:{“_index”: “testindex”,”_type”:”mytype”}}n{“message”:” test 1″,”fromhost”:”hostname”,”facility”:”kern”,”priority”:”notice”,”timereported”:”2012-07-30T12:38:38.737354+02:00″,”timegenerated”:”2012-07-30T12:38:38.737354+02:00″}n”
“reply”:    {
“took”:    2,
“items”:    [{
“create”:    {
“_index”:    “testindex”,
“_type”:    “mytype”,
“_id”:    “g8XJzt-TRd6WOu3pRyzrcg”,
“error”:    “MapperParsingException[Failed to parse [timegenerated]]; nested: NumberFormatException[For input string: “2012-07-30T12:38:38.737354+02:00″]; ”