Introducing the rsyslog config builder tool

Wouldn’t it be great if we had an interactive tool that permitted it novices to build complex rsyslog configurations interactively? Without any need to understand the inner workings or even terminology? Indeed, that would not only be great, but in our opinion also remove a lot of pressure that we have on rsyslog’s documentation part.

In the light of this, we started to work on a tool called the rsyslog configuration builder“. An initial preview goes life right now today and we invite everyone to play with it. The initial version is hopefully already useful for many cases. However, the primary intent is to gather community feedback, reactions and further suggestions.

The initial version has a restricted set of supported inputs and outputs, as well as other constructs. It works with rsyslog v7.6 and above. The tool can be used anonymously and configurations are kept during the session, with the session timeout being a couple of hours. So that should be a fair amount of time to build your config. For the future, we plan to permit saving the config when logged in into the site. That way, you can work multiple days on a single configuration.

We have many more enhancements on our mind, but first of all we would like to get your feedback. You can provide feedback any way you like, but we would be extremely happy if you post either to the rsyslog mailing list or create an issue in the rsyslog website’s github project.

Moving rsyslog stable to v8…

I am happy to tell that I have finally finished the 8.2.0 rsyslog release and it is on its way to announcement, package build and so on. While v8 was basically finished since before last christmas, we had a couple of mostly nits holding the release. This is probably a lesson that we need to accept some nits instead of holding a release for so long.

With that said, there is still a nit: it is undecided how the new doc system shall be distributed. In 8.2.0, it will be a tarball inside the main tarball, something that already (and rightfully) drew some criticism. However, this time I have decided to keep on with the release rather than block it again. After all, it’s easy to fix this in 8.2.1 if we settle the issue quickly.

With v8 stable released, project policy is to officially stop support for v7. In any case, we’ll have a close look at 7.6 and will provide assistance in the next couple of weeks. After all, v8 is a considerable change, even some of the more exotic contributed output modules are not available with it. So there is a good point in keeping support for v7.6 at least until we really see there is no technical reason for keeping it.

I hope that v8 will be well-perceived … and look forward to hear both success and bug reports.

If you are interested in what are the big changes, please have a look at this slightly older blog post describing what’s new in the rsyslog v8 engine.

liblogging-stdlog – code reviewers sought

I am looking for some code reviewers.

I have worked hard on liblogging-stdlog, which aims at becoming the new enhanced syslog() API call. The library is thread- and signal-safe and offers support for multiple log drivers, just like log4j does.

More elaborate description is here: https://github.com/rsyslog/liblogging

 

As the lib is becoming ready for prime time, I would really appreciate if some folks could have a look at the code and check for problems and/or offer suggestions in regard to the API.

It is only the code inside ./stdlog (roughly 1400 lines of code, including header files, empty lines and comments): https://github.com/rsyslog/liblogging/tree/master/stdlog

The man page is available here: https://github.com/rsyslog/liblogging/blob/master/stdlog/stdlog.rst

All feedback is very welcome!

Thanks,

Rainer

the rsyslog v8 engine – what’s new?

I have written a small presentation on what has changed in the rsyslog v8 engine. It takes a developer’s perspective, but is most probably also of interest for administrators who would like to understand why the v8 engine scales out much better for slow outputs like ElasticSearch or databases.

For developers, it also contains the basic know-how needed to successfully (and without pain!) upgrade a pre-v8 output plugin to v8.

rsyslog Ubuntu Packages: calling for contributors

Adiscon is providing Ubuntu packages for recent rsyslog versions since quite a while. We would now like to go one step further. I have created a git repository on github for all the package build source files and Andre (who creates the packages) will populate it shortly. First of all, this will enable all folks interested in building their own packages to do so based on what we use.

But secondly, and more importantly, we hope to attract contributors for creating even better packages. One of my personal goals would be to make this project the core of an “rsyslog official” PPA in the Ubuntu ecosystem. From what I see on the mailing list, forums and so on, Ubuntu is becoming an increasingly important platform for logging (maybe due to their quick support for things like Elasticsearch as well as their current decision not to use systemd journal?). Whatever the reason is, Ubuntu seems to become of the premier logging platforms and we would like to make the rsyslog experience on it as good as possible.

This requires most easy access to packages as well as them being well-maintained from an Ubuntu PoV. So if you like rsyslog and know your way around building packages and PPAs on Ubuntu, please consider joining this effort. Also, all feedback is very welcome.

Note that we will most probably start similar efforts for the other Adiscon-supported platforms shortly. But right now Ubuntu is our prime focus, given the visible increase in its userbase.

How I maintain multiple rsyslog versions

Rsyslog is an enterprise-class project. Among others, this means we need to provide different versions for quite a while (enterprises don’t like to update every 6 month, and don’t do so for good reasons).

As such, we support multiple major versions. As of this writing, the versions used in practice are v5, v7 and v8 upcoming shortly. There are also v0,…,v4 versions out there, and what I write applys to them equally. If there is development going on for a version, there is a vX-devel branch. This is for those that want the new features, obviously at the expense of some enhanced chance for instability. And then there is vX-stable, which is the stable branch. Once something is named vX-stable, it NEVER receives new features, just bug fixes. There is one exception from the naming rules: the most current development version is held inside the git master branch — because that’s where people expect it. So, currently, there is no v8-devel branch, but it’s named “master” instead.

Maintaining multipl versions sounds scaring and cumbersome. Indeed, this fear came up when talking about maintaining multiple doc versions inside the rsyslog-doc project. Thanks to git, this is not the case. It’s actually dumb easy to do, once you get a little used to the workflow.

The key thing is to make changes in the oldest branch where they belong (with “oldest” being the one that the project really want’s to support). So if there is a bug that’s present in v7-stable, v7-devel and master, do NOT fix it in master. Checkout v7-stable and fix it there. The same applies to doc updates. Once you have fixed it, all you need to do is to merge the changes up. Especially for smaller changes, git does most of the hard plumbing. Occasionally, there may be some merge conflicts, but these can usually quickly solved. There are some rare exceptions, usually as the result of big refactoring. For doc, merge conflicts are always almot trivial to solve (at least have been in the past). It’s advisabile to merge up early. The longer you wait, the more work you need to do if a merge conflict occurs (because you probably don’t remember well enough what you did). The backdraw of that is that the git history becomes a little cluttered with merge entries, but that’s how it is (nothing in life is free ;)).

So in the workflow is as follows (I use the v7-stable as the “oldest” version in this workflow sample; that’s also the typical case):

  1. git checkout v7-stable
  2. update whatever you need to update
  3. git commit
  4. git checkout v7-devel
  5. git pull . v7-stable
  6. git checkout master
  7. git pull . v7-devel
  8. DONE

In esscence, you make the change once and take 30 seconds to merge it up. Merge conflicts happen seldom and are quite unlikely for doc changes. Those that happen are usually just additions to related areas, and so it takes another 30 seconds to fix them. All in all very painless.

If you check the git logs, you’ll often find occurences of the workflow above (just look for the merges, they really stand out).

Now there remains the question of multiple versions, not branches. For example, we have v7.4.0, 7.4.1, 7.4.2,… How do we handle updates to them. Well, easy: first of all when 7.4.x is out, 7.4.(x-1) is NEVER again change (why should it, that’s for the next version). So we actually do NOT do any updates to already released versions (again, think about the resulting mess). So it just boils down to being able to fetch the exact same version later. And this is extremely easy with git: just use a tag, that’s what this is meant for.

So whenever I do a release, the last thing I do after it is build and being announced is do a “git tag “, e.g. “git tag v7.4.7”. That way, I can later do “git checkout v7.4.7” and have exactly the version I look for. Note, though, that you need to do “git push –tags” in order to push the tags to your published repository.

Conclusion: using git branches and release tags makes it extremely easy to maintain a multitude of version. Without git, this would be an almost undoable task (we had a really hard time in those CVS days…).

Why is the rsyslog project running its own infrastructure?

Currently, there is a very valuable discussion going on on the rsyslog mailing list on how we can attract more contributors and how moving things to github can help with this. I was writing a longer reply, and then it occured to me that it probably is better to blog about this topic as it may be of future interest to have the current thinking (relatively) esay accessible.
The core question being asked is “Would it make more sense to leave all that sort of info in one place instead of pulling people from the *official* rsyslog repo on *github* over to *rsyslog.com*?” Then, it had some examples of how logstash uses github README files for that [full text here in the rsyslog mailing list archives].
These are my thoughts:
 
I don’t object placing a bit more into readme and related files. However, while this one of the official  repos*, it’s not the official *project site*, so I think it makes sense to ask people to go to the project site for anything that’s non-trivial. Looking at the logstash things, to me it pretty much looks like they do the same thing. I have absolutely no problem putting the same information into README-like files inside the repository, as long as the authorative pages reside on the official project web (and yesterday I have begun to do so).
In essence, the question boils down to “why don’t I want to be locked in into github?” It’s (bad) experience. When rsyslog started we used sourceforge.net intensively. At that time, it was as popular and the place to be on” that github is today. Then, they got some crazy biz ideas, got technical and other problems and … it really turned out to be a mess and bad for the project.

Also, we used Freshmeat for most of our public announcements around the same time (a bit later, and partly togehter with sf.net). All went extremely well, until out of the sudden they had the bright idea of a “redesign” that made the site unusable. Again, bad hit for us.

 
I really don’t want to see this again. If I use github exclusively, I have no chance to redirect things if they go crazy. I have a hard time moving on to the next hot spot when it is born (because all the Google juice is with github).
 
I admit that I would like to have the buzz of “I tell you this is an active project, I can judge by the number of issues opened and closed (and so on)”. But at the same time, I remember that this type of entanglement always turned against us after some time.
For example, I tend to file the bug number (actually a link to it) in both commit comments and the change log. If I now link to sf.net, github, etc and they “go away” (some way or the other), all of these links become invalid and I don’t have a chance to fix this (again, I’ve actually experienced that, so it’s not pure theory). And such things happen. For a non-rsyslog example, think about the bitkeeper debacle, where they out of the sudden thought it was due to charge kernel developers for so-far free services.

Again, I would really like to have some of the cool things. With git, it’s relatively safe, as I can move the repo around quickly, and as long as the rsyslog.com site contains the main pointer to where the current official git is, the risk is very limited. But other than that, my experience is that the short-term benefits come at the risk of severe long-term problems.

I am open to really good arguments why I am wrong. One good argument would probably be an OSS complete hosting service that is in place for 15+ yrs without any interruption in user base and breaking URLs – and that is still a hot spot. I guess sf.net mostly qualifies in that category, but as I said, we had our own story with them…

I frankly admit that I am very conservative in this regard. After all, I even post things like this on my personal blog, and not on a site directly owned by Adiscon. But, you may say, you use blogger, so don’t you take a risk here? Well, you probably have noticed that I use blogger under a domain that I have full control over. So whenever they go crazy, I can move on to some other place (not totally effortlessly, but I am in control of those things).

rsyslog on github

In some pretty long discussions, it turned out that most (if not all) users were not aware that rsyslog has an official repository on  github for over six month now. So it probably is a good idea to tell the world.

While it probably is obvious, I would also like to say that I accept pull requests via github. Actually, this was always the case, even when I had no repository on github. If someone sends me a pull request, I fetch from whereever that someone’s git is located, merge it into my git and push the final result. So there is nothing special with a repository that’s on github.

Please note that we currently continue to provide rsyslog via the Adiscon git. With git, it’s irrelevat where a repository is hosted — git is not a server-based system. While I like github’s interface, I do not like to lock rsyslog into github. We almost got locked into sourceforge.net in the early days of rsyslog development and I was extremely happy that we resisted to do much more than just the CVS repository on SF when they had their really bad time. Github currently for sure is the place to be, but I like the ability to move away when the next cool thing pops up. As such, I have setup a new page that describes where the rsyslog repositories can be found currently.  This page will be updated as need arises.

Please note that official repositories are peers, so all are equal (minus maybe a one-minute delta during push operations). Pick whichever you like most.

On liblognorm1

Liblognorm is a fast-samples based normalization library. It’s brand new version 1.0.0 will be released today. It is a major improvement over previous versions, but unfortunately we needed to change the API. So some notes are due.

Liblognorm evolves since several years and was intially meant to be used primarily with the Mitre CEE effort. Consequently, the initial version of liblognorm (0.x) uses the libee CEE support library in its API.

As time evolved, the initial CEE schema underwent considerable change. Even worse, Mitre lost funding for CEE. While the CEE ideas survived as part of Red Hat-driven “Project Lumberjack”, the data structures became greatly simplified and JSON based. That effectively made libee obsolete (and also in parts libestr, which was specifically written to support CEE’s initial requirement of embedded NUL chars in strings).

Recently, Pavel Levshin converted liblognorm to native JSON, which helped improve performance and simplicity for many client applications. Unfortunately, this change broke interface compatibility (and there was no way to avoid that, obviously…).

The current library is the result of that effort. Application developers are encouraged to switch to this version, as it provides the benefit of a simpler API. This version is now being tracked by the git master branch.

However, if you need to stick to the old API, there is a git branch liblognorm0, which contains the previous version of the library. This branch is also maintained for important bug fixes, so it is safe to use.

We recommend that packagers create packages both for liblognorm0 and liblognorm1. Note that liblognorm’s development packages cannot coexist on the same system as the PKGCONFIG system would get into trouble. Adiscon’s own packages follow this schema.

Note that rsyslog will soon begin to enjoy the benefits of liblognorm1. This results in a notable performance improvement for mmnormalize. Support will initially become available in v8.

rsyslog impstats analyzer reloaded

My co-worker Andre had a little time and extended the rsyslog impstats analyzer to support generating graphs. IMHO this gives you fantastic insight into how the system operates. While I know that some folks already push this data to their internal health monitoring system, the beauty of the online rsyslog impstats analyzer is that you do not need to install anything — a log file with stats is all you need to get you going. Let’s look at a quick sample. This is a page returned by the analyzer’s check phase:

If you look closely, you notice that there are active links to the problem areas. Let’s follow the one to action 3 queue:
Here, we see the problem in action: the queue initially behaves well, but relatively soon keeps it’s size at close to 1k messages. At the same time, the enqueue rate (green line) is much higher. Consequently, the discard rate (blue line) is getting pretty high. The delta between discard and enqueue line is what is actually processed: obviously far too few messages to keep up. 
BTW: this chart is from a real-world case. One problem here was that the queue’s discard mark was set too low (close to 1k), so that the queue never could fill up over the 1k mark even though it had a much larger max size. When we fixed this, we saw that the queue consumer (a script) could actually not keep up with the message volume (not shown here). So this hint from the graph was also pointing to a real problem (but you need to fix one problem after another and then look at new stats).
Note that graphics can also be generated for non-problem counters – you can select from a menu on top of the pages (see first screenshot). The web app supports cumulated stats and can create delta values out of them. It also offers the ability to use logarithmic y axis scaling, which is useful in some cases. The app does not well handle imudp traffic. The reason is that imudp reports both ipv4 and ipv6 listeners with the same counter name, and so we don’t have any chance to differentiate between them. An update for imudp is planned to address this.
More enhancements for the statistics analyzer are planned. We are actively looking for feedback.