rsyslog: why disk-assisted queues keep a file open

From time to time, someone asks why rsyslog disk-assisted queues keep one file open until shutdown. So it probably is time to elaborate a bit about it.

Let’s start with explaining what can be seen: if a disk-assisted queue is configured, rsyslog will normally use the in-memory queue. It will open a disk queue only if there is good reason to do so (because it severely hurts performance). The prime reason to go to disk is when the in memory queue’s configured size has been exhausted. Once this happens, rsyslog begins to create spool files, numbered consequtively. This should normally happen in cases where e.g. a remote target is temporarily not reachable or a database engine is responding extremely slow.  Once the situation has been cleared, rsyslog clears out the spool files. However, one spool file always is kept until you shut rsyslog down. Note well that when the disk queue is idle, all messages are processed even though the the physical spool file still contains some already-processed data (impstats will show you the exact details).

This is expected behaviour, and there is good reason for it. This is what happens technically:

A DA queue is actually “two queues in one”: It is a regular in-memory queue, which has a (second) helper disk queue in case it neeeds it. As long as operations run smoothly, the disk queue is never used. When the system starts up, only the in-memory queue is started. Startup of the disk queue is deferred until it is actually needed (as in most cases it will never be needed).

When the in-memory queue runs out of space, it starts that Disk queue, which than allocates its queue file. In order to reclaim space, not a single file is written but a series of files, where old files are deleted when they are processed and new files are created on an as-needed basis. Initially, there is only one file, which is read and written. And if the queue is empty, this single file still exists, because it is the representation of a disk queue (like the in-memory mapping tables for memory queues always exist, which just cannot be seen by the user).

So what happens in the above case is that the disk queue is created, put into action (the part where it writes and deletes files) and is then becoming idle and empty. At that stage, it will keep its single queue file, which holds the queue’s necessary mappings.

One may now ask “why not shut down the disk queue if no longer needed”? The short answer is that we anticpiate that it will be re-used and thus we do not make the effort to shut it down and restart when the need again arises. Let me elaborate: experience tells that when a system needs the disk queue once, it is highly likely to need it again in the future. The reason for disk queues to kick into action are often cyclic, like schedule restarts of remote systems or database engines (as part of a backup process, for example). So we assume if we used it once, we will most probably need it again and so keep it ready. This also helps reduce potential message loss during the switchover process to disk – in extreme cases this can happen if there is high traffic load and slim in-memory queues (remember that starting up a disk queue needs comparativley long).

The longer answer is that in the past rsyslog tried to shut down disk queues to “save” ressources. Experience told us that this “saving” often resulted in resource wasting. Also, the need to synchronize disk queue shutdown with the main queue was highly complex (just think about cases where we shut it down at the same time the in-memory queue actually begins to need it again!). This caused quite some overhead even when the disk queue was not used (again, this is the case most of the time – if properly sized). An indication of this effect probably is that we could remove roughly 20% of rsyslog’s queue code when we removed the “disk queue shutdown” feature!

Bottom line: keeping the disk queue active once it has been activated is highly desirable and as such seeing a queue file around until shutdown is absolutely correct. Many users will even never see this spool file, because there are no exceptional circumstances that force the queue to go to disk. So they may be surprised if it happens every now and then.

Side-Note: if disk queue files are created without a traget going offline, one should consider increasing the in-memory queue size, as disk queues are obviouly much less efficient than memory queues.

rsyslog on AIX: who wants to help porting?

Thanks to the recent IBM contribution of a partial rsyslog 5.8.6 port to AIX, we have come across that platform again. Unfortunately, some license issues prevent me from merging the IBM contribution to the current rsyslog release (7.4+). I tried to work with IBM resolving these issues, but it just occurred to me that actually doing the port ourselves is probably easier than wrangling with license issues. I could set some time aside to make changes (as a by-activity, aka “may be sluggish at times”), but I neither have time, machines nor know-how. David Lang has already said has has some machines to try compile on, but we still have know-how issues.

So question now: would there some more folks be interested in rsyslog on AIX? If so, could you join a porting effort and provide us with advise? The most pressing thing currently is setting up an environment where we can compile at all. Possibly, once this is done, we may run into issues that require some more in-depth programming know-how, but that’s to be seen (and must must not necessarily happen).

Anyone interested in helping with this please either comment to this blog posting or make yourself heard on the rsyslog mailing list or send me a private mail.

PS: all code derived from this work needs to be placed under ASL 2.0 – just so that you know.

rsyslog under ASL 2.0: why I can’t simply do that

The ASL 2.0 topic boiled up again due to a much-appreciated IBM contribution to make rsyslog 5.8.6 work on AIX. Unfortunately, this contribution was done under GPLv3+. I tried to work with IBM to have it released under ASL 2.0, but their legal department is of the opinion that this is not possible. This resulted in some restrictions, which can be found in the git branches’ README file. Most importantly, it’s a dead-end branch branch which cannot be merged up to new versions.

As an option, IBM said if I would release rsyslog 5.8.6 under ASL 2.0, they could release their patch under ASL 2.0 as well. Unfortunately, I cannot do this by just declaring so.

You need to keep in mind that I do not own the complete copyright! Actually, there are a couple of hundreds contributors that I can find in git history … and there are even more in the original sysklogd, which I can’t even identify all. As such, it is simple impossible for me to change the license “on my own”.
To reach the current state, I did quite intense research on who contributed what some time (two years maybe?) ago. I concentrated on “easy” files (new ones without sysklogd code) and contacted all contributors if they agree on ASL 2.0. Those files that I got agreement on are now under ASL 2.0. For the “hard” files, I even did some research which of them were still from the original syslogd, which was released under BSD. I replaced these files /code sequences with the BSD version (which is identical), and so could put that under ASL 2.0. But still there is a notable body of code left that needs to be under GPLv3+. Some can probably be changed with more author contacting (especially Red Hat contributions, where I have a general “go” for most things), but some code definitely needs to be rewritten. I guess the additional research takes at least some weeks, with the rewrite taking maybe another month or so.

Bottom line: it’s far from being easy and there is no pressing need. But I don’t want to fallback on the effort just because of the IBM contribution. I would need to rewrite it in any case, so there is no point in merging mainstream.

systemd journal endlessly sends data to imjournal

When Fedora updated to rsyslog 7.4.0 in Fedora 19, they changed the default way how they obtain local log messages. In all previous releases, imuxsock, the traditional Unix/Linux input was used. That also works in conjunction with journal, at least as far as traditional syslog is involved. This is because journal provides these messages via the log socket it passes to rsyslog. Unfortunately, journal hides information if structured content is available. In order to obtain that, Red Hat has contributed a new module, imjournal, which reads the journal database via native journal APIs and as such gets hold of all data. With 7.4.0, the Fedora maintainers switched the default input in the desire to get full logging information into the system.

Unfortunately, rapidly many users begun to report serious problems with duplicate messages exhausting their system resources. As can be seen in this systemd journal database corruption Red Hat bug tracker, the root cause of this problem is that the journal database is corrupted and the native journal API thus endlessly returns the same messages ever and ever again.

The authors of imjournal did not use rsyslog’s ratelimiting capability in this module. Actually, there was good reason not to do it, as journal itself provides some rate-limiting, so doing it twice doesn’t sound like a good idea.

If you dig deeper into the journal bug, you’ll notice that this is not a new problem. It occurs with journal tools as well, and does so obviously for quite a while. In my opinion, it now surfaced as a big issue simply because imjournal was the first application ever to make serious use of the journal database.

After I received reports on this problem, I acted immediately. First and foremost, I put a warning message into imjournal telling about the risk on relying on the journal database. I can obviously not solve that problem, but then I mitigated it by adding rate-limiting to the module. The current git version by default does not permit more than 20,000 messages within a 10 minute interval, so even with a broken journal, no more than 120,000 messages per hour should be emitted. That’s as far as I can go handling this situation. Note that there is already a fix for the current problem in journal, it is also included in the bug tracker I mentioned above. However, in the past couple of weeks I talked to quite some folks on rsyslog/journal integration, and journal database corruption was frequently mentioned. Nobody told me about loops, but as I guard against future problems, I will keep the ratelimiting default active in imjournal.

Note that the rsyslog project “officially” always suggested to use imjournal only in cases where there is hard need for it (this was a result of talking to the end users). Thankfully I published a presentation on rsyslog and systemd journal integration including this fact just last week, so I can even proof that recommendation ;)

How will we go forward? If you need a quick patch, you should rebuild rsyslog from the git v7-stable branch. For a single patch that addresses the ratelimiting, you can use this one. The full patch set will be included in 7.4.1. We are working with Red Hat to release it ASAP, hopefully tomorrow.

Another viable solution is to simply disable imjournal in rsyslog.conf and replace it with the former imuxsock configuration. Please note that this problem does not affect any version of rsyslog installed from Adiscon’s repositories. This is because we never use imjournal by default.

I hope this clarifies the root cause of this problem, how to mitigate it and how the rsyslog team intends to prevent further harm from potential systemd journal database corruption.

What is a rsyslog signature provider?

Some folks asked what a (rsyslog) signature provider is. In essence, it is just a piece of software written to a specific interface.

There are little functional requirements for signature providers. Most obviously, we expect that it will provide some useful way to sign logs, what means it should be able to provide an integrity proof of either a complete log file or even some log records some time after the logs are recorded.

There is no “official signature provider spec” available. As usual in open source, the provider interface is well defined inside its actual source header file. Take a look for example at rsyslog’s git to see the definition.

If you look at that code, the interface is fairly simple. The primary entry points a programmer needs to define are

  • OnFileOpen
  • OnRecordWrite
  • OnFileClose

As the names imply, they are called whenever a log file is opened, written to, and closed. Again, what the provider does at these events is up to the provider.

The initial provider that ships directly with rsyslog is using a secure remote signing service provided by Guardtime.  There are various reasons why I selected this provider, and I elaborated at length on them in my blog posting on why I selected Guardtime as first rsyslog signature provider. In a nutshell, it is a method that does not rely on local secrets and as such can not be circumvented (again: details in my other posting). Obviously, Guardtime offers paid services, but they also operate free services for everyone to use (a key point when I evaluated that technology).

If you have privacy concerns about the Guardtime provider, you may want to read my blog posting on just that question.

Note that signature provider implementations can use the above entry points in any way they like. For example, the may ignore file open and close and do all work in record write. So this small interface is highly flexible. If you look at the full interface description, you’ll also see some more events which primarily provide housekeeping functionality. If you intend to write your own signature provider, I suggest to start with a copy of the guardtime provider and change it according to your needs. That way, you have a great example of how the housekeeping calls are handled.

new rsyslog devel branch: 7.5

After the successful and important release of 7.4 stable, we are working hard on further improving rsyslog. Today, we release rsyslog 7.5.0, which opens up the new development branch.

With that branch, we’ll further focus on security. Right with the first release, we provide a much-often demanded feature: TLS support for the Reliable Event Logging Protocol (RELP). Even better, we also support the compression feature that is included in GnuTLS, so you can use im/omrelp not only with TLS, but also turn on compression if you like so.

There is already a guide for TLS-secured RELP with rsyslog available. It was written for the experimental 7.3.16 release, which never was officially announced. So the guide contains some (now-unnecessary) build steps, but also a full example of client and server configurations.

Note that the current TLS support supports anonymous authentication, only (via Diffie-Hellman key exchange). This is due to the current librelp implementation. However, librelp is scheduled to become more feature-rich and rsyslog will support this, once it is available. In general, you can expect some more enhancements (and more fine-grained config control for those in the need) for rsyslog’s RELP subsystem.

There are more things to come, especially in the security context. So stay tuned to what the next version of rsyslog will provide!

rsyslog: How can I see which field contains which data item?

A topic that comes up on the rsyslog mailing list or support forum very often is that folks do not know exactly which values are contained on which fields (or properties, like they are called in rsyslog, e.g. TAG, MSG and so on).

So I thought I write a quick block post on how to do that. I admit, I do this mostly to save me some time typing and having it at hand for reference in the future.

This is such a common case, that rsyslog contains a template which will output all fields. The basic idea is to use that template and write all message into a special file. A user can than look up that file, find the message in question, and see exactly which field/property is populated with which data.

It’s very easy to do. All you need to place this at the top of your rsyslog.conf:


*.* /var/log/debugfmt;RSYSLOG_DebugFormat

and restart rsyslog. After that, all messages will be written to /var/log/debugfmt. Note that each message is written to multiple lines, with each of its properties. This is a very rough example of how this may look:


Debug line with all properties:
FROMHOST: 'linux', fromhost-ip: '127.0.0.1', HOSTNAME: 'linux', PRI: 46,
syslogtag 'rsyslogd:', programname: 'rsyslogd', APP-NAME: 'rsyslogd', PROCID: '-', MSGID: '-',
TIMESTAMP: 'Jun 10 18:56:18', STRUCTURED-DATA: '-',
msg: ' [origin software="rsyslogd" swVersion="7.4.0" x-pid="4800" x-info="http://www.rsyslog.com"] start'
escaped msg: ' [origin software="rsyslogd" swVersion="7.4.0" x-pid="4800" x-info="http://www.rsyslog.com"] start'
inputname: rsyslogd rawmsg: ' [origin software="rsyslogd" swVersion="7.4.0" x-pid="4800" x-info="http://www.rsyslog.com"] start'

Note that I lied a bit: it’s not actually all properties, but the most important ones (among others, the JSON properties are missing, as this is a large amoutn of data – but I may add it in later versions).

If you know what to know what, for example, APP-NAME is, you can simply look it up (here it is “rsyslogd”).

Note that I have deliberately given this example in legacy configuration language, so that it can be used by all users of rsyslog. You can of course also make use of the more advanced versions.

new rsyslog 7.4 stable branch released

We just released rsyslog 7.4.0, a new stable release which replaces the 7.2 branch. After nine month of hard work, there are many exciting enhancements compared to 7.2, and I thought I give you a quick rundown of the more important new features. Note that while I list some “minor features” at the end of this posting, the list is not complete. I left out those things that are primarily of interest for smaller groups of users. So if you look for a specific feature not mentioned here, it may pay to look at the ChangeLog or post a question to the rsyslog mailing list.

With this release, the rsyslog project officially supports version 7.4. Support for 7.2 will gradually be phased out. If support for older versions is required, utilizing Adiscon’s professional services for rsyslog is recommended.

Note that I only list main headlines for each of the features. Follow links (where provided) to gain more in-depth information.

Security Package

 

Improved Rate-Limiters

  • introduction of Linux-like rate-limiting for all inputs
  • “Last message repeated n times” now done on a per-input basis; makes it much more useful AND increases processing speed.

Systemd Journal support

  • omjournal writes messages to the journal
  • imjournal obtains messages including journal-specific meta data items from the journal

Performance Improvements

  • Disk Queue Performance greatly improved
  • DNS cache greatly improved and optimized
  • output file writer (omfile) now fully supports background writing
  • script optimizer now changes frequently-used long-running filters into equivalent quick ones when possible (this even affects some distros default configs and is a great performance saver)

Minor Features

  • various plugins now support the new RainerScript based config language
  • omlibdbi improvements, among them support for transactions
  • ommysql now supports transactions
  • improved omfile zip writing speed
  • performance optimizations
  • omelasticsearch improvements (bulk processing mode, local error file)
  • omudpspoof now supports packets larger than 1472 by using fragmentation
  • omrelp now supports session timeout
  • contributed module (not project-supported) omrabbitmq was added

rsyslog journal support

We expect that rsyslog and the systemd journal will be found together in quite some szenarios (if you are curios on what exactly we mean, check the “rsyslog vs. journal?” posting).

As such, it makes a lot of sense to think about providing integration facilities. Thanks to rsyslog’s modular architecture, it wasn’t very hard to provide the necessary building blocks. In the 7.3 experimental branch, two new modules (omjournal and imjournal) have been developed. They provide the capability to write to the journal as well as pull data out of it. Usually, the latter is not really necessary, as journald still provides log messages to the system log socket. But unfortunately, journal developers have decided only to pass on a subset of the logging information. They exclude the structured data content. However, such data is only available if their own logging API is being used by applications, and this is currently not really the case. So right now using just the regular system log socket input should be sufficient in almost all cases. Howerver, should structured data become more prominent in the journal, using imjournal gains rsyslog access to it.

For some more background information on the integration, you can also watch a quick presentation that I recorded:

rsyslog logfile encryption

Starting with version 7.3.11, rsyslog provides native support for log file encryption. In order to keep things highly flexible, both an encryption provider interface as well as an provider have been designed and implemented.

The provider interface enables to use different encryption method and tools. All that is need to be done is to write the actual provider to that interface. The encryption provider (called a “crypto provider”) then implements everything necessary for the actual encryption. Note that decryption is not part of the rsyslog core. However, a new user tool – rscryutil – is provided by the project to handle this task. This tool is currently being considered to be part of the crypto provider. Consequently, there is no specific interface for it. The reasoning behind that decision is that there is very little provider-independet plumbing in this tool, so abstracting things looks a bit like over-engineering (we may change that view in the future based on how things evolve).

The initial crypto provider is based on libgcrypt, which looks like an excellent choice for (almost?) all use cases. Note that we support symmetric cryptography, only, inside the crypto provider. This is simply due to the fact that public/private key cryptography is too slow to be used for mass encryption (and this is what we do with log files!). Keep in your mind that even TLS uses symmetric cryptography for the actual session data (for the same reason), and uses public/private key cryptography only for the exchange of the symmetric key. In any case, folks using that functionality in high-secure environments are advised to check the exact security requirements. Periodic re-keying of the encrypted log files may be necessary.

No on to “how does it work?“: The encryption functionality, at the action level, is enabled by specifying a crypto provider, for example as follows:

action(type=”omfile” file=”/var/log/logfile”
       cry.provider=”gcry” 
       )

This tells the action to load the crypto provider. What then happens, is up to the crypto provider. For obvious reasons, just loading it is not sufficient, you need at least to provide a crypto key. How this is done depends on the crypto provider. It is assumed that all crypto providers user the “cry.” config parameter name space. With the gcry provider, a full action may look like this:

action(type=”omfile” file=”/var/log/logfile”
       cry.provider=”gcry” cry.key=”testtesttesttest”
       )

This is the bare minimal set of parameters for gcry – it’ll use defaults for the algorithm and use the key directly as specified from the configuration file.

Specifying the key directly in the configuration is usually a bad idea: in most setups, many or even all users can read the config file, so the key is far from being secret. We even thought of not permitting this option, as it is so insecure. We left it in, however, as it is a great help in initial testing.

For production use, there are two other modes: one is to read the key from a side-file (which needs to be sufficiently secured!) or obtain it via a special program that can obtain the key via any method it likes. The latter is meant to be used for higher security use cases. We assume that the (user-written) program can do all those “interesting things” that you can do to manage crypto keys in a secure manner. Access to the keyprogram, of course, needs to be properly secured in that case.

I hope this gives you a glimpse at how rsyslog log file encryption works. For details on the crypto provider parameters, see the official rsyslog grcy crypto provider documentation.