Why we use Static Code Analysis

We use static code analysis for two reasons. Both of them should probably be well-know, but discussions show that that’s not always the case. So I thought writing a small blog post makes sense.

The first reason is obvious: static analyzers help us catch code problems in early stages, and they do so without any special effort needed by test engineers. The analyzer “thinks” about many cases a human being does not think about and so can catch errors that are sometimes embarrassingly obvious – albeit you would have still overlooked them. Detecting these things early saves a lot of time. So we try to run the analyzers early and often (they are also part of our CI for that reason).

The benefits come at an expense, and this expense is named “false positive“. They happen and I always get asked if I can’t make an exception to cover such a thing. Unfortunately, I cannot. If I would allow one static analyzer fail into the QA system, all further builds would fail, triggering static analysis unusable. So, sorry, if you run into a false positive, you need to find a way to work around it. In my experience the “const” keyword in C is a little gem that not only helps secure against accidental variable modification but also gets you going a long way in regard to static analyzers. But, granted, sometimes it’s hard to work around false positives. It’s worth it, so just do it ;-)

The second reason for using static analysis also seems obvious, but in my experience is often overlooked: humans tend to forget some important test cases. It is well-known and accepted that test should be crafted by QA engineers instead of the folks that wrote the code (because if the developer would otherwise only test what he had thought about in the first place). For smaller projects, that’s not always possible, but even more important QA folks also can overlook necessary test cases. The risk is reduced by using specific test crafting methodology, but it still exists. This is especially true as due to combinatorical explosion not all configuration setting interactions can be tested. So picking dynamic tests is always a compromise between what is a) seen at all, b) desirable, and c) possible. Static analysis helps with these problems. While it obviously can also fail, I have seen static analysis more than once detect things that we did not cover in dynamic tests. That way it introduces an additional layer of protection. It also sometimes brought up the need for additional dynamic tests.

It should be mentioned that fuzzing is also a great thing to have inside a QA system, but I unfortunately did not yet have the opportunity to deploy it on some real project. But even fuzzing when done by Google is limited by the same combinatorical explosion problem in regard to configuration settings. For example, rsyslog has many more than 250 config settings, so we have more than 2^250 = 18092513943330655534932966407607000000000000000000000000000000000000000000000 configurations we would need to fuzz – simply impossible [yes, an approach would be to fuzz the tuple (config,data), but that’s a different topic ;-)].

Static analysis is not the answer to the software QA problem. But it is an extremely valuable building block!