Automating Coverity Scan with a complex TravisCI build matrix

This is how you can automate Coverity Scan using Travis CI – especially if you have a complex build matrix:

  • create an additional matrix entry you will exclusively use for submission to Coverity
  • make sure you use your regular git master branch for the scans (so you can be sure you scan the real thing!)
  • schedule a Travis CI cron job (daily, if permitted by project size and Coverity submission allowance)
  • In that cron job, on the dedicated Coverity matrix entry:
  • cache Coverity’s analysis tools on your own web site, download them from there during Travis CI VM preparation (Coverity doesn’t like too-frequent downloads)
  • prepare your project for compilation as usual (autoreconf, configure, …) – ensure that you build all source units, as you want a full scan
  • run the cov-int tool according to Coverity instructions
  • tar the build result
  • use the “manual” wget upload capability (doc on Coverity submission web form); make sure you use a secure Travis environment variable for your Coverity token
  • you will receive scan results via email as usual – if you like, automate email-to-issue creation for newly found defects
Actual Sample Code: rsyslog uses this method. Have a look at it’s .travis.yml (search for “DO_COVERITY”). The scan run can be found here (search for “DO_CRON”), a script that calls another one ultimately doing the steps described above. These two scripts are called from .travis.yml.

Why so complicated? After all, there is a Coverity addon for Travis CI. That’s right, and it probably works great for simple build matrices. Rsyslog uses Travis quite intensely. At the time of this writing, we spawn 8 VMs for different tests and test environments plus the one for cron jobs (so 9 in total). With the Coverity addon, this would result in 9 submissions per run, because the addon runs once per VM requested. Given that rsyslog currently has an allowance of 3 scans per day, a single run would use up all of our allowance plus 6 more which would not succeed. Even if it would work, it would be a big waste of Coverity resources and be pretty impolitely both to Coverity and fellow users of the free scan service.

With the above method, we have full control over what we submit to Coverity and how we do it.With our current scan allowance, we use exactly one scan per day, which leaves two left for cases where it is useful to run interim scans during development.

Why do you use master branch for the scan? Coverity recommends a dedicated branch.
In my opinion, software quality assurance works well only if it is automated and is applied to what actually is being “delivered”. In that sense, real CI with one Coverity run per pull request (before we merge!) would be best. We can’t do this due to Coverity allowance limitations. The next best thing is to do it as soon as possible after merge (daily in our case) and make sure it is actually run against the then-current state. Introducing a different branch just for scan purposes sounds counter-productive:

  • you either need to update it to full master branch immediately before the scan, in which case you can use master directly
  • or you do update the “Coverity branch” only on “special events” (or even manually), which would be counter the idea to have things checked as early as possible. Why would you withhold scanning something that you merged to master, aka “this is good to go”? Sounds like an approach to self-cheating to me…
In conclusion, using scanning master is the most appropriate way to automatically scan your project. Be bold enough to do that.
 
Does your approach work for non-Travis, e.g. Buildbot?
Of course, it works perfectly with any solution that permits you to run scheduled “builds”. We also use Buildbot for rsyslog. We could have placed it there. Simply because of ease of use we have decided to use Travis for automating the scan as well. We could also have used a scheduled Buildbot instance. So the approach described here is universal. The only thing you really want is that the scan is initiated automatically at scheduled intervals – otherwise you could simply use the manual submit.

Time for a better Version Numbering Scheme!

The traditional major.minor.patchlevel versioning scheme is no longer of real use:

  • users want new features when they are ready, not when a new major version is crafted
  • there is a major-version-number-increase fear in open source development, thus major version bumps sometimes come very random (see Linux for example)
  • distros fire the major-version-number-increase fear because they are much more hesitant to accept new packages with increased major version
In conclusion, the major version number has become cosmetic for many projects. Take rsyslog as an example: when we switched to rsyslog scheduled releases, we also switched to just incrementing the minor version component, which now actually increments with each release – but nothing else. We still use the 8.x.0 scheme, where x is more or less the real version number. We keep that old scheme for cosmetic reasons, aka “people are used to it”.
Some projects, most notably systemd, which probably invented that scheme, are more boldly: the have switched to a single ever-increasing integer as version (so you talk of e.g. version 221 vs 235 of systemd). This needs some adaption from the user side, but seems to be accepted by now.
And different other approach, known from Ubuntu, is to use “wine-versioning”: just use the date. So we have 14.04, 16.04, 17.04 representing year and month of release. This also works well, but is unusual yet for single projects.
 
What made me think about this problem? When Jan started his log anonymizer project, we thought about how it should be versioned. We concluded that a single ever-incrementing version number is the right path to take. For this project, and most probably for all future ones. Maybe even rsyslog will at some time make the switch to a single version digit…

Busy at the moment…

Some might have noticed that I am not as active as usual on the rsyslog project. As this seems to turn out to keep at least for the upcoming couple of weeks, I’d like to give a short explanation of what is going on. Starting around the  begin of June I got involved into a political topic in my local village. It’s related to civil rights, and it really is a local thingy, so there is little point in explaining the complex story. What is important is that the originally small thing grew larger and larger and we now have to win a kind of election – which means rallies and the like. To make matters a little worse (in regard to my time…) I am one of the movement’s speakers and also serve as subject matter expert to our group (I am following this theme for over 20 years now). To cut a long story short, that issue has increasingly eating up my spare time and we are currently at a point where little is left.

Usually, a large part of my spare time goes into rsyslog and related projects. Thankfully, Adiscon funds rsyslog development, and so I can work on it during my office hours. However, during these office hours I am obliged to work on paid support cases and also a limited number of things not directly related to rsyslog. Unfortunately, August (and early September) is main holiday season in our region. As such, I also have limited co-workers available that I could share rsyslog work with. And to make matters “worse”, I need to train new folks to get started with rsyslog work – one of them does a summer internship, so I need to work with him now. While new folks is always a good thing to have on a project (and I really appreciate it), this means further reduction of my rsyslog time.

The bottom line is that due to all these things together, I am not really able to react to issues as quickly as I would like to. The political topic is expected to come to a conclusion -one way or the other- by the end of September. Due to personal reasons, I will not be able to do work at all in early October (long-planned out of office period), but I hope to be fully available again by mid October. And the good news is that we will have a somewhat larger team by this time because Jan, who does the internship, will continue to work part time on the project. Even better: Pascal will be with Adiscon for the next months on a full-time basis and will be able to work considerable hours on rsyslog.

So while we have a temporary glitch in availability, I am confident we’ll recover from that in autumn and we have very exciting work upcoming (for example, the TLS work Pascal has just announced). I have also a couple of very interesting suggestions which are currently discussed with support contract customers.

All in all, I beg for your patience. And I am really thankful to all of our great community members which do excellent work on the rsyslog mailing list, github and other places. Not to forget the great contributions we increasingly get. Looking forward to many more years of productive syslogging!

Introducing new team member

Good news: we have some new folks working on the rsyslog project. In a small mini-series of two blog postings I’d like to introduce them. I’ll start with Jan Gerhards, who already has some rsyslog-related material online.

Jan studies computer science at Stuttgart University. He has occasionally worked on rsyslog for a while. This summer, he does a two-month internship at Adiscon. He also plans to continue to work on rsyslog and logging-related projects in the future. Right now, he looks into improving the mmanon plugin. He is also trying to improve debug logging, which I admit is my personal favorite (albeit it looks like this is the lower-priority project ;-)).

Would creating a simple Linux log file shipper make sense?

I currently think about creating a very basic shipper for log files, but wonder if it really makes sense. I am especially concerned if good tools already exists. Being lazy, I thought I ask for some wisdom from those in the know before investing more time to search solutions and weigh their quality.

I’ve more than once read that logstash is far too heavy for a simple shipper, and I’ve also heard that rsyslog is also sometimes a bit heavy (albeit much lighter) for the purpose. I think with reasonable effort we could create a tool that

  • monitors text files (much like imfile does) and pulls new entries from them
  • does NOT further process or transform these logs
  • sends the resulting file to a very limited number of destionations (for starters, I’d say syslog protocol only)
  • with the focus on being very lightweight, intentionnally not implementing anything complex.
Would this be useful for you? What would be the minimal feature set you need in order to make it useful? Does something like this already exist? Is it really needed or is a stripped-down rsyslog config sufficient?
I’d be grateful for any thoughts in this direction.

rsyslog error reporting improved

Rsyslog provides many up-to-the point error messages for config file and operational problems. These immensly helps when troubleshooting issues. Unfortunately, many users never see them. The prime reason is that most distros do never log syslog.* messages and so they are just throw away and invisible to the user. While we have been trying to make distros change their defaults, this has not been very successful. The result is a lot of user frustration and fruitless support work for the community — many things can very simple be resolved if only the error message is seen and acted on.

We have now changed our approach to this. Starting with v8.21, rsyslog now by default logs its messages via the syslog API instead of processing them internally. This is a big plus especially on systems running systemd journal: messages from rsyslogd will now show up when giving

$ systemctl status rsyslog.service

This is the place where nowadays error messages are expected and this is definitely a place where the typical administrator will see them. So while this change causes the need for some config adjustment on few exotic installations (more below), we expect this to be something that will generally improve the rsyslog user experience.

Along the same lines, we will also work on some better error reporting especially for TLS and queue-related issues, which turn out high in rsyslog suport discussions.

Some fine details on the change of behaviour:

Note: you can usually skip reading the rest of this post if you run only a single instance of rsyslog and do so with more or less default configuration.

The new behaviour was actually available for longer, It needed to be explicitly turned on in rsyslog.conf via

global(processInternalMessages=”off”)

Of course, distros didn’t do that by default. Also, it required rsyslog to be build with liblogging-stdlog, what many distros do not do. While our intent when we introduced this capability was to provide the better error logging we now have, it simply did not turn out in practice. The original approach was that it was less intrusive. The new method uses the native syslog() API if liblogging-stdlog is not available, so the setting always works (we even consider moving away from liblogging-stdlog, as we see this wasn’t really adopted). In essence, we have primarily changed the default setting for the “processInternalMessages” parameter. This means that by default, internal messages are no longer logged via the internal bridge to rsyslog but via the syslog() API call [either directly or
via liblogging). For the typical single-rsyslogd-instance installation this is mostly unnoticable (except for some additional latency). If multiple instances are run, only the “main” (the one processing system log messages) will see all messages. To return to the old behaviour, do either of those two:

  1. add in rsyslog.conf:
    global(processInternalMessages=”on”)
  2. export the environment variable RSYSLOG_DFLT_LOG_INTERNAL=1This will set a new default – the value can still be overwritten via rsyslog.conf (method 1). Note that the environment variable must be set in your startup script (which one is depending on your init system or systemd configuration).

Note that in most cases even in multiple-instance-setups rsyslog error messages were thrown away. So even in this case the behaviour is superior to the previous state – at least errors are now properly being recorded. This also means that even in multiple-instance-setups it often makes sense to keep the new default!

rsyslog’s master-candidate branch gone away

Thanks to the new improved CI workflow, we do no longer manually need to do a final check of pull requests. I have used the new system for roughly two weeks now without any problems. Consequently, I have just removed the master-candiate branch from our git (with a backup “just in case” currently remaining in the adiscon git repository).

Anyone contributing, please check the CI status of your PRs, as we can only merge things that pass the CI run. Note, though, that there still is a very limited set of tests which may falsly fail. Their number is shrinking, and I usually catch these relatively shortly and restart them. If in doubt, please add a comment to the PR and I’ll investigate.

Improvements in CI environment and workflow change

Roughly one and a half year ago we at the rsyslog project started to get serious with CI, that time with travis only. Kudos to Thomas D. “whissi” for suggesting this and helping us to setup the initial system. In aid of CI, we have changed to a purely Pull Request (PR) driven develpoment model, and have made great success with that.

Over time, we have added more CI ressources (thanks to Digital Ocean for capacity sponsorship!) and begun to use Buildbot to drive those. Buildbot is a great tool, and has helped us tremendously to further improve software quality. Unfortunately, though, it does not offer as close integration into (guthub) PRs as Travis does. This resulted in a workflow where we had all PRs initially checked by Travis and, if all went well, I manually merged them to master-candidate branch, which Buildbot monitored. In those infrequent cases where the buildbot tests detected problems, I needed to manually contact the PR submittors. This worked well, but required some effort on my part.

The past two week we designed and implemented a small script that integrates github with buildbot much like Travis does. In essence, a new PR (or an update to an existing one) now automatically initiates the buildbot build AND the result is shown right on github inside the PR. That’s pretty sweet as it a) keeps submittors informed of everything, b) provides even better coverage of multiple platfrom testing and c) saves me from a lot of manual labor. Note that at the moment we see some infrequent quirks from this system (like some buildbot slaves not reporting, probably due to temporary network issues), but it already works much better than the old manual system. Also, I still have the capability to check things manually if there is a quirk.

As a consequence, we will change the workflow once again, removing master-candidate branch from it. Now that each and every PR is checked with all checks we have, there is no need to have an interim step when finally merging.

rsyslog and liblognorm will switch to libfastjson as replacement for json-c

We have been using json-c for quite a while now and had good success with it. However, recent problem reports and analysis indicate that we need to replace it in the future. Don’t get me wrong: json-c is a solid piece of software, but we most probably use it much more intensely as the json-c developers ever anticipated. That’s probably the actual root cause why we need to switch.

A main problem spot is performance: various experiments, profiler runs, code review and experimental implementations have prooven that json-c is a severe bottleneck. For example, in the evaluation of liblognorm v2 performance, we found out that json-c calls dominated processing time by more than 90%. Once we saw this, we dug down into the profiler and saw that the hashtable hash calculation as well as memory allocations took a large amount of overall processing time. We have submitted an initial performance enhancement PR to json-c which also got merged. That already removed processing time considerably. We continued on that path, resulting in a quite large second enhancement PR, which I withdrew due to disagreement with the json-c development lead.

A major problem for our application of json-c is that the hash table implementation beneath it is not a good match to our needs. We have been able to considerably speed it up by providing a new hash function (based on perl’s hash function), but to really get the performance we need, we would need to replace that system. However, that is not possible, because json-c considers the hash tables part of its API. Actually, json-c considers each function, even internal ones, as part of the API, so it is very hard to make any changes at all.

Json-c also aims at becoming fully JSON compliant. It currently is not due to improper handling of NUL bytes, but the longer-term plan is to support NUL bytes this. While this is a good thing to do for a general json library, it is a performance killer for our use case. I know, because I faced that same problems with the libee implementation years ago, where we ditched it later in accordance with the CEE standards body board. I admit I also have some doubts if that change in json-c will actually happen, as it IMHO requires a total revamp of the API.

Also, the json-c project releases far to infrequently (have a look at recent json-c releases, the last one was April, 2014). And then, it takes the usual additonal timelag for distros to pick up the new version. So even if we could successfully submit further performance-enhancing PRs to json-c, it would take an awful lot of time before we could actually use them. I would definitely not like to create private packages for rsyslog users, as this could break other parts of a system.

Finally, json-c contains a real bad race bug in reference counting, which can cause rsyslog to segfault under some conditions. A proposed fix was unfortunately not accepted by the json-c development lead, so this is an open issue. Even if it were, it would probably take a long time until the release of the fixed version and its availability in standard packages.

In conclusion and after a lot of thinking, we decided that it is best to fork json-c, which we than did. The new project is named libfastjson. As the name suggests, it’s focus is on performance. It will not try to be 100% JSON compliant. We will not support NUL characters in a standards-conformant way. I expect this not to be a big deal, as json-c also never did this, and the number of complaints seem to be very low. So libfastjson will not aim to be  general purpose json library, but one that offers high performance at some functionality cost and works well in highly threaded applications. Note that we will also reduce the number of API functions and especially remove those that we do not need and that cost performance. Also, the data store will probably be changed from the current hashtable-only system to something more appropriate to our tasks.

Libfastjson already includes many performance enhancement changes and a solid fix for the reference counting bug. Up until that bug, we planned to release in the Feb..April 2016 time frame, together with liblognorm v2. Now this has changed, and we actually did a kind of emergency release (0.99.0) because of the race bug. The source tarball is already available. We are working on packages in the rsyslog repositories (Ubuntu is already done). Rsyslog packages are not yet build against it, but we may do an refresh after the holiday period.

Rsyslog 8.15.0 optionally builds against libfastjson (it is preferred if available). Due to the race bug, we have decided that rsyslog 8.16.0 will require libfastjson.

A side-note is due: we have been thinking about a replacement for the variable subsystem since summer or so. We envision that there are capabilities even beyond of what libfastjson can do. So we still consider this project and think it is useful. In regard to liblognorm, however, we need to provide a more generic interface, and libfastjson is a good match here. Also, we do not know how long it will take until we replace the variable system. We don’t even know if we actually can do it time-wise.

rsyslog release policy issues

The usual end of the year release policy discussion has begun on the rsyslog mailing list and I wanted to post some thoughts here for broader audience and easy access in the future. Enjoy ;)


Up until ~15 month ago, we released when there was need to. Need was defined as


– important enough (set of bugfixes)
– new functionality

This resulted in various releases. We had the stable/devel releases. Stable releases were rare, devel frequent.

Now, we have scheduled releases. Actually, a release is triggered when we hit a certain calender date, irrelevant of whether or not there is need to release (there is always one or two minor fixes, so we will probably never exprience a totally blank release). We also have switched to stable releases only, and done so without grief (basically because a) we have improved testing and b) users didn’t use devel at all).

I just dug into the old discussion. A good entry point is probably this here, where we talk about patches:

http://lists.adiscon.net/pipermail/rsyslog/2014-October/038796.html

The new system works reasonably well. It has it’s quircks, though. Let’s look at a concrete example:

8.14.0, to me, was an absolutely horrible release. The worst we have done in the past 2 to 3 years. I worked hard on fixing some real bad race issues with JSON variables. Friday before the release I was ready to release that work, which would be really useful for folks that make heavy use of those variables. Then, over the weekend and Monday, it turned out that we may get unwanted regressions that weren’t detected earlier (NO testbench can mimic a heavy-used production system, so let’s not get into “we need better tests” blurb). The end result was that I pulled the plug on release day, and what we finally released was 8.13.0 plus a few small things. All problems with variables persisted. If I had have half a week to a week (don’t remember exactly) more, we could have done a real release instead of the 8.13 re-incarnation. But, hey, we run on a schedule.


Now 8.15.0 fixes these problems (except for the json-c induced segfault, which we cannot fix in rsyslog). I also has all other “8.14” enhancements and fixes and so is actually worth 3 month of work. It is a *very heavy* release. Usually, I’d never released such a fat release shortly before the holiday period. Not that I distrust it, and we really got some new testing capabilites (really, really much better), so it is probably the most solid release for a longer time (besides the small quirk with the missing testbench files). But in general I don’t like to do releases when I know there is very limited resources available to deal with problems. That’s the old datacenter guy in me. But, again, hey, we run on a schedule.


There have similiar occasions in the past 14 month. That’s the downside. And due to the 6-week cycle things usually do not get really bad.


The scheduled model has a lot of good things as well. First of all, everyone (users and contributors) know when the next release will be. This also means you can promise to include something into a specific release. However, usually users know when the release happens, but not what will be part of it, so in a sense it’s not much better than before IMO. The new model has advantages for me: less releases mean less work. Also, I do not longer really need to think about when to do a release, which feature is important engouh and so. I just look at the calender and know that, for example, in 2016, November 15th we will have a release, no matter if I am present, no matter what is done code-wise etc (we actually had, for the first time everm a release while I was in vacation and it went really well as I learned later). That really eases my task.


All of this bases on the “we release every 6 weeks, interim releases happen only for emergencies and anything else may be pulled as patches” policy. If we now begin to say “this problem is inconvenient to ..{pick somebody}”, we need to do a re-release we get into trouble. I wonder which groups of “sombody” are important enough to grant non-emergency releases. Are only distro maintainers important enough? Probably not. So enterprise users? Mmmm.. maybe small enterprises as well? Who judges this? So let’s assume every user is as important as every other (an idea I really like). If I then look at my change logs, I think I would need to release more frequent. In essence, I would need to release again when it is needed, which is, surprise, the as-needed schedule).


Rsyslog is not a project big enough to do an even more complex release schedule. To keep things managable to me, I need to release either


a) as-needed

b) on schedule (except for *true* emergencies)

And *that all* is the reason for my reluctance to break the release policy because this time distro maintainers experience the bug versus end users.


I am currently tempted to switch back to “as-needed” mode, even though this means more work for me.