rsyslog on AIX: who wants to help porting?

Thanks to the recent IBM contribution of a partial rsyslog 5.8.6 port to AIX, we have come across that platform again. Unfortunately, some license issues prevent me from merging the IBM contribution to the current rsyslog release (7.4+). I tried to work with IBM resolving these issues, but it just occurred to me that actually doing the port ourselves is probably easier than wrangling with license issues. I could set some time aside to make changes (as a by-activity, aka “may be sluggish at times”), but I neither have time, machines nor know-how. David Lang has already said has has some machines to try compile on, but we still have know-how issues.

So question now: would there some more folks be interested in rsyslog on AIX? If so, could you join a porting effort and provide us with advise? The most pressing thing currently is setting up an environment where we can compile at all. Possibly, once this is done, we may run into issues that require some more in-depth programming know-how, but that’s to be seen (and must must not necessarily happen).

Anyone interested in helping with this please either comment to this blog posting or make yourself heard on the rsyslog mailing list or send me a private mail.

PS: all code derived from this work needs to be placed under ASL 2.0 – just so that you know.

rsyslog under ASL 2.0: why I can’t simply do that

The ASL 2.0 topic boiled up again due to a much-appreciated IBM contribution to make rsyslog 5.8.6 work on AIX. Unfortunately, this contribution was done under GPLv3+. I tried to work with IBM to have it released under ASL 2.0, but their legal department is of the opinion that this is not possible. This resulted in some restrictions, which can be found in the git branches’ README file. Most importantly, it’s a dead-end branch branch which cannot be merged up to new versions.

As an option, IBM said if I would release rsyslog 5.8.6 under ASL 2.0, they could release their patch under ASL 2.0 as well. Unfortunately, I cannot do this by just declaring so.

You need to keep in mind that I do not own the complete copyright! Actually, there are a couple of hundreds contributors that I can find in git history … and there are even more in the original sysklogd, which I can’t even identify all. As such, it is simple impossible for me to change the license “on my own”.
To reach the current state, I did quite intense research on who contributed what some time (two years maybe?) ago. I concentrated on “easy” files (new ones without sysklogd code) and contacted all contributors if they agree on ASL 2.0. Those files that I got agreement on are now under ASL 2.0. For the “hard” files, I even did some research which of them were still from the original syslogd, which was released under BSD. I replaced these files /code sequences with the BSD version (which is identical), and so could put that under ASL 2.0. But still there is a notable body of code left that needs to be under GPLv3+. Some can probably be changed with more author contacting (especially Red Hat contributions, where I have a general “go” for most things), but some code definitely needs to be rewritten. I guess the additional research takes at least some weeks, with the rewrite taking maybe another month or so.

Bottom line: it’s far from being easy and there is no pressing need. But I don’t want to fallback on the effort just because of the IBM contribution. I would need to rewrite it in any case, so there is no point in merging mainstream.

Joining the Guardtime Technical Advisory Board

I just wanted to let everyone know that I will be joining the Guardtime technical advisory board. The board’s prime mission is to make sure that Guardtime implements technology that users need.

While implementing the rsyslog log signature interface and it’s first signature provider, I worked very closely and productive with the Guardtime folks. Over time, we exchanged more and more ideas and the whole thing culminated when I visited their office in Tallinn, Estonia where we had even better discussions and hacking sessions.

Guardtime seems to have enjoyed this work as well, and so I was recently asked if I would like to join their to-be-formed technical advisory board (TAB). At first, I was a bit hesitant because I wondered if there is any collision of interest between being on the Guardtime  TAB and driving rsyslog forward. So I talked a bit with the Guardtime folks clarifying my position. As it turns out, the whole point of the TAB is to assemble experts from different fields and let them discuss and provide information on what end-users really need. The rationale is very obvious: Guardtime as a company can only succeed if they provide what the market really needs. There is no point in trying the market to pick up some product or technology that has no real value. Too many companies tried that in the past and failed miserably. So, in that sense, I can act as the logging user’s agent inside the TAB and help push Guardtime in a direction that really is of value for logging. As I already think their current technology is a really good fit for what we need in logging, this is an exciting opportunity.

I also admit that it is very appealing that I will get ready access to Guardtime’s team of qualified cryptographers (including chief scientist Atho Buldas, who helped building the Estonian national PKI), which is a really big win in regard to secure cryptography in rsyslog. I never before had the chance to work with such top-notch crypto people. In fact, they already provided feedback on the non-Guardtime encryption bits in rsyslog and I am sure that working closer with them will help us getting very solid cryptography into rsyslog. If in doubt, just look at the rsyslog git tree and commit history about when all this new crypto stuff got in :-)

As a side-note, I am also talking with some other party about a custom module using a HMAC based tamper proof system for log records. While this is not as strong as the Gurdtime signature provider, there are obviously some use cases for it as well. If that project materializes, it will of course be merged into the main rsyslog git tree. So, for the concerned, rsyslog will definitely not get a Guardtime bias. After all, my main interest is driving rsyslog forward and, again, you can do that against what “the market” demands.

One of the things I discussed with the Guardtime cryptographers while I was in Tallinn was about how we could extend log integrity proof to a full network relay case, where relays even drop messages. This is a very hard and yet unsolved problem (for details see my paper on log signing in the relay case). While we got no ready solution from the initial discussion, we had some very promising ideas. This for sure is nothing we’ll see within the next quarter or so, but I am happy I now have the opportunity to discuss this and other hard problems with the knowledgeable (and very friendly, btw ;)) folks from Guardtime.

So for me getting on the TAB I think is a win-win-win situation for rsyslog, it’s user base and of course also Guardtime. Hope a lot of good stuff will come out of it.

systemd journal endlessly sends data to imjournal

When Fedora updated to rsyslog 7.4.0 in Fedora 19, they changed the default way how they obtain local log messages. In all previous releases, imuxsock, the traditional Unix/Linux input was used. That also works in conjunction with journal, at least as far as traditional syslog is involved. This is because journal provides these messages via the log socket it passes to rsyslog. Unfortunately, journal hides information if structured content is available. In order to obtain that, Red Hat has contributed a new module, imjournal, which reads the journal database via native journal APIs and as such gets hold of all data. With 7.4.0, the Fedora maintainers switched the default input in the desire to get full logging information into the system.

Unfortunately, rapidly many users begun to report serious problems with duplicate messages exhausting their system resources. As can be seen in this systemd journal database corruption Red Hat bug tracker, the root cause of this problem is that the journal database is corrupted and the native journal API thus endlessly returns the same messages ever and ever again.

The authors of imjournal did not use rsyslog’s ratelimiting capability in this module. Actually, there was good reason not to do it, as journal itself provides some rate-limiting, so doing it twice doesn’t sound like a good idea.

If you dig deeper into the journal bug, you’ll notice that this is not a new problem. It occurs with journal tools as well, and does so obviously for quite a while. In my opinion, it now surfaced as a big issue simply because imjournal was the first application ever to make serious use of the journal database.

After I received reports on this problem, I acted immediately. First and foremost, I put a warning message into imjournal telling about the risk on relying on the journal database. I can obviously not solve that problem, but then I mitigated it by adding rate-limiting to the module. The current git version by default does not permit more than 20,000 messages within a 10 minute interval, so even with a broken journal, no more than 120,000 messages per hour should be emitted. That’s as far as I can go handling this situation. Note that there is already a fix for the current problem in journal, it is also included in the bug tracker I mentioned above. However, in the past couple of weeks I talked to quite some folks on rsyslog/journal integration, and journal database corruption was frequently mentioned. Nobody told me about loops, but as I guard against future problems, I will keep the ratelimiting default active in imjournal.

Note that the rsyslog project “officially” always suggested to use imjournal only in cases where there is hard need for it (this was a result of talking to the end users). Thankfully I published a presentation on rsyslog and systemd journal integration including this fact just last week, so I can even proof that recommendation ;)

How will we go forward? If you need a quick patch, you should rebuild rsyslog from the git v7-stable branch. For a single patch that addresses the ratelimiting, you can use this one. The full patch set will be included in 7.4.1. We are working with Red Hat to release it ASAP, hopefully tomorrow.

Another viable solution is to simply disable imjournal in rsyslog.conf and replace it with the former imuxsock configuration. Please note that this problem does not affect any version of rsyslog installed from Adiscon’s repositories. This is because we never use imjournal by default.

I hope this clarifies the root cause of this problem, how to mitigate it and how the rsyslog team intends to prevent further harm from potential systemd journal database corruption.

rsyslog performance: main and action queue workers

Rsyslog has both “main” message queues and action queues. [Actually, “main message queues” are queues created for a ruleset, “main message” is an old-time term that was preserved even though it is no longer accurate.]

By default, both queues are set to one worker maximum. The reason is that this is sufficient for many systems and it can not lead to message reordering. If multiple workers are concurrently active, messages will obviously be reordered, as the order now, among others, depends on thread scheduling order.

So for now let’s assume that you want to utilize a multi-core machine. Then you most probably want to increase the maximum number of main message queue workers. The reason is that main queue workers process all filters for all rules inside the rule set, as well as full action processing for all actions that are not run on an asynchronous (action) queue. In typical setups, this offers ample of opportunity for concurrency. Intense tests on the v5 engine have shown near linear scalability to up to 8 cores, with still good improvements for higher number of cores (but increasing overhead). Later engines do most probably even better, but have not been rigorously performance tested (doing it right is a big effort in itself).

Action queues have a much limited concurrency potential because they do only a subset of the work (no filtering, for example, just generating the strings and executing the actual plugin action). The output module interface traditionally permits only one thread at a time to be present within the actual doAction() call of the plugin. This is to simply writing output plugins, but would be needed in any case as most can not properly handle real concurrent operations (think for example about writing to sequential files or a TCP stream…). For most plugins, the doAction() part is what takes most processing time. HOWEVER, multiple threads on action queues can build string concurrently, which can be a time consuming operation (especially when regexpes are involved). But even then it is hard to envision that more than two action queue worker threads can make much sense.

So the bottom line is that you need to increase the main queue worker threads to get much better performance. If you need to go further, you can fine-tune action queue worker threads, but that’s just that: fine-tuning.

Note that putting “fast” actions (like omfile) on an async action queue just to be able to specify action threads is the wrong thing to do in almost all situations: there is some inherent overhead with scheduling the action queue, and that overhead will most probably eat up any additional performance gain you get from the action queue (even more so, I’d expect that usually it will slow things down).

Action queues are meant to be used with slow (database, network) and/or unreliable outputs.

For all other types of actions, even long-running, increasing the main queue worker thread makes much more sense, because this is where most concurrency is possible. So for “fast” action, use direct action queues (aka “no queue”) and increase the main thread workers.

Finally a special note on low-concurrency rulesets. Such rulesets have limited inherent concurrency. A typical example is a ruleset that consists of a single action. For obvious reasons, the number of things that can be done concurrently is very limited. If it is a fast action, and there is little effort involved in producing the strings (most importantly no regex), it is very hard to gain extra concurreny, especially as a high overhead is involved with such fine-grained concurrency. In some cases, the output plugin may come to help. For example, omfile can do background writes, which will definitely help in such situations.

You are in a somewhat better shape if the string builder is CPU intense, e.g. because it contains many and complex regexes. Remember that strings can be build in parallel to executing an action (if run on multiple threads). So in this case, it makes sense to increase the max number of threads. It makes even more sense to increase the default batch size. That is because strings for the whole batch are build, and then the action plugin is called. So the larger the batch, the large these two partitions of work are. For a busy system, a batch size of 10,000 messages does not sound unreasonable in such cases. When it comes to which worker threads to increase, again increase the main queue workers, not the action ones. It must be iterated that this gives the rsyslog core more concurrency to work with (larger chunks of CPU bound activity) and avoids the extra overhead (though relatively small) of an async action queue.

I hope this clarifies worker thread settings a bit more.

recent librelp development

I thought I share some news on what I have been busy with and intend to be in the future. In the past days, I have added more config options to librelp, which now supports GnuTLS compression methods as well as provides the ability to set the Diffie-Hellman key strength (number of bits) and – for experts – to set the GnuTLS priorities, which select the cipher methods and other important aspects of TLS handling.

This is done now and I also added rsyslog facilities to use these new features. Some of this stuff is not yet released, but will soon be.

The next big step is preventing man-in-the-middle attacks. I will most probably use SSH-type fingerprint authentication, so that no full PKI is necessary to make this work. I guess implementing this feature set will probably take a couple of days and will keep you posted on how things progress.

What is a rsyslog signature provider?

Some folks asked what a (rsyslog) signature provider is. In essence, it is just a piece of software written to a specific interface.

There are little functional requirements for signature providers. Most obviously, we expect that it will provide some useful way to sign logs, what means it should be able to provide an integrity proof of either a complete log file or even some log records some time after the logs are recorded.

There is no “official signature provider spec” available. As usual in open source, the provider interface is well defined inside its actual source header file. Take a look for example at rsyslog’s git to see the definition.

If you look at that code, the interface is fairly simple. The primary entry points a programmer needs to define are

  • OnFileOpen
  • OnRecordWrite
  • OnFileClose

As the names imply, they are called whenever a log file is opened, written to, and closed. Again, what the provider does at these events is up to the provider.

The initial provider that ships directly with rsyslog is using a secure remote signing service provided by Guardtime.  There are various reasons why I selected this provider, and I elaborated at length on them in my blog posting on why I selected Guardtime as first rsyslog signature provider. In a nutshell, it is a method that does not rely on local secrets and as such can not be circumvented (again: details in my other posting). Obviously, Guardtime offers paid services, but they also operate free services for everyone to use (a key point when I evaluated that technology).

If you have privacy concerns about the Guardtime provider, you may want to read my blog posting on just that question.

Note that signature provider implementations can use the above entry points in any way they like. For example, the may ignore file open and close and do all work in record write. So this small interface is highly flexible. If you look at the full interface description, you’ll also see some more events which primarily provide housekeeping functionality. If you intend to write your own signature provider, I suggest to start with a copy of the guardtime provider and change it according to your needs. That way, you have a great example of how the housekeeping calls are handled.

new rsyslog devel branch: 7.5

After the successful and important release of 7.4 stable, we are working hard on further improving rsyslog. Today, we release rsyslog 7.5.0, which opens up the new development branch.

With that branch, we’ll further focus on security. Right with the first release, we provide a much-often demanded feature: TLS support for the Reliable Event Logging Protocol (RELP). Even better, we also support the compression feature that is included in GnuTLS, so you can use im/omrelp not only with TLS, but also turn on compression if you like so.

There is already a guide for TLS-secured RELP with rsyslog available. It was written for the experimental 7.3.16 release, which never was officially announced. So the guide contains some (now-unnecessary) build steps, but also a full example of client and server configurations.

Note that the current TLS support supports anonymous authentication, only (via Diffie-Hellman key exchange). This is due to the current librelp implementation. However, librelp is scheduled to become more feature-rich and rsyslog will support this, once it is available. In general, you can expect some more enhancements (and more fine-grained config control for those in the need) for rsyslog’s RELP subsystem.

There are more things to come, especially in the security context. So stay tuned to what the next version of rsyslog will provide!

rsyslog: How can I see which field contains which data item?

A topic that comes up on the rsyslog mailing list or support forum very often is that folks do not know exactly which values are contained on which fields (or properties, like they are called in rsyslog, e.g. TAG, MSG and so on).

So I thought I write a quick block post on how to do that. I admit, I do this mostly to save me some time typing and having it at hand for reference in the future.

This is such a common case, that rsyslog contains a template which will output all fields. The basic idea is to use that template and write all message into a special file. A user can than look up that file, find the message in question, and see exactly which field/property is populated with which data.

It’s very easy to do. All you need to place this at the top of your rsyslog.conf:

*.* /var/log/debugfmt;RSYSLOG_DebugFormat

and restart rsyslog. After that, all messages will be written to /var/log/debugfmt. Note that each message is written to multiple lines, with each of its properties. This is a very rough example of how this may look:

Debug line with all properties:
FROMHOST: 'linux', fromhost-ip: '', HOSTNAME: 'linux', PRI: 46,
syslogtag 'rsyslogd:', programname: 'rsyslogd', APP-NAME: 'rsyslogd', PROCID: '-', MSGID: '-',
TIMESTAMP: 'Jun 10 18:56:18', STRUCTURED-DATA: '-',
msg: ' [origin software="rsyslogd" swVersion="7.4.0" x-pid="4800" x-info=""] start'
escaped msg: ' [origin software="rsyslogd" swVersion="7.4.0" x-pid="4800" x-info=""] start'
inputname: rsyslogd rawmsg: ' [origin software="rsyslogd" swVersion="7.4.0" x-pid="4800" x-info=""] start'

Note that I lied a bit: it’s not actually all properties, but the most important ones (among others, the JSON properties are missing, as this is a large amoutn of data – but I may add it in later versions).

If you know what to know what, for example, APP-NAME is, you can simply look it up (here it is “rsyslogd”).

Note that I have deliberately given this example in legacy configuration language, so that it can be used by all users of rsyslog. You can of course also make use of the more advanced versions.

PRISM – and Google and Facebook Intelligence

While almost everyone (including me) is talking about the PRISM system, it may be worth stepping a little back and looking at one of the underlying problems.

Looking closely, PRISM drew it’s data from just a couple of large sources, including Google and Facebook. But, again, it’s very few sources. While I guess they combine this data with intelligence they have gathered elsewhere, the fact remains that only two handful of sites seem to store quite a bit of information about everyone one the Internet.

So the question probably is what do privately owned corporations like Google and Facebook do with our data? Privacy advocates are never tired to warn about privacy concerns in regard to the Internet giant’s services. In fact, the big guys probably have much more insight into their user’s habits than intelligence agencies knew in the past about high-profile people.

And it’s not only their users: think about all those Facebook “Like” buttons, Google “+1” buttons or even Google Analytics, which is used on an extremely large number of sites. Not to mention blogger (that I am using right now ;)) or youtube, or… If if you are not logged in to your G+ or FB account – or even don’t have one – they gather information about you based on your IP address or geographical location. With IPv6, this problem will become even worse, as it is much easier to continuously identify someone surfing the web, especially if the ISP isn’t looking too much into privacy-enhancements.

And it’s undeniable those giants not only have but make use of the wealth of personal profile information: just look at how targeted ads are nowadays and how customized search results are, based on what the companies know about each and every one of us. Actually, this is what is the actual (astronomical) market value of these companies: not their service per sé but rather their intelligence capabilities over the users base – and how they make use of it.

The best example in my opinion is probably Google Now: it is absolutely astonishing and fascinating how well Google predicts what is of interest for me. Of course it is also pretty frightening, if you just think a couple of seconds about it. Plus, users even give away their roaming profile to use this service which “simple is to good to turn off for privacy concerns”. I have to admit I use it, too, at last every now and then – knowing that only continuous use makes it really valuable to the user – and Google.

Of course, there are good reasons to use services like Facebook and Google – we all do it routinely. In some situations it’s even unavoidable to use them. But all of that is probably a story for a different blog posting.

Let me conclude that PRISM should also be a wakeup call about how much power about our personal lifes we put into the hands of a very small set of large commercial companies. After all, this is what the government is after, trying to even better understand out actions by combining various data sources. In that light, does one really wonder why governments are not so eager about new and better privacy laws and enforcing them?

Bottom line? There is not only Web 2.0, there is also Privacy 2.0. While the first got great features, the latter got pretty feature-less, at least from the personal privacy point of view.