Our experts' opinions, announcements and comments about software quality management. Do you have an opinion? We are interested, so share it with us!

Fifty Shades of Deep
By Arpad Beszedes, PhD The true depth of static analysis 14-08-2015

So, how deep is your static analysis anyway? Lexical, syntactic, semantic, or maybe it uses abstract interpretation? If you are confused about these words, don't worry: as a user of static analysis tools you might never need to understand the internal working of your favorite tool - as long as it delivers the defect reports you expect from it. Same as a driver, who does not need to understand how the engine works in order to drive his or her car. However, the driver must be aware of the capabilities of the vehicle before deciding to take it to rally racing, holiday trip with the family, or moving around in the city for quick shopping (and finding good parking places - arghh). And, yes, the price tag also matters.


With static analysis tools, you also need to fully understand what it can do for you and what it is not good for. If you are designing software for a space shuttle you want the most thorough analysis in order to find practically all defects lurking in the code (and you probably have the budget too). But, to quickly check your source code for an average web application against the most trivial defects or coding style issues at the end of the working day, a facile but sometimes imprecise (and probably free) tool is suitable most of the times.


There are many aspects by which tools can be classified, but the "depth" of analysis is probably the most simple tag to attach when the precision and thoroughness are to be expressed. Deep or Deepest? Some tools like to use these terms regarding their analysis to distinguish them "from the crowd". However, there is no universally accepted "depth scale", hence it is very difficult to objectively compare the capabilities of different tools in this regard. Frankly, analysis depth should only be regarded as a relative term, so a tool or algorithm could be "deeper" than the other; in other words, there should be a reference point when talking about "shallow" or "deep".


In the following, I will overview the current technological possibilities that static analysis tools of today are based on. I will start with the most shallow ones and then dive deeper. At the highest level, I will follow the fundamental analysis levels initially invented by compiler writers, but later used for other static analyses as well: lexical, syntactic, and semantic.



Lexical analysis


The simplest source code analysis approach is to read the code as plain text and identify different kinds of words such as numbers and identifiers (or tokens, in technical language) in a sequential order but don't care about any particular relationship among them. But even on this first level we have various options. A language independent analyzer would identify very general tokens like identifiers but will be unable to distinguish between language specific keywords, whereas a bit deeper analyzer will consider language (or dialect) specific tokens.


Properties: very fast, simple algorithms, immune to erroneous or incomplete code.


Good for: simplest code style checks such as identifier capitalization, some size metrics, simplest code duplication detection, ... well, not too much else.


Limitations: really?



Syntactic analysis


This is the group to which most practically usable tools (and still made for mere mortals!) belong. The code structure, as determined by the language, is the basis for the analysis, and often a hierarchical program representation (a parse tree or an abstract syntax tree) is first created on which the various algorithms are then applied. On this level, structures like arithmetic expressions, method and class declarations and alike are fully recognized, combining the "words" from the lexical analysis into them. However, the "meaning" (or semantics) of the syntactic structures are not analyzed deeper, for instance, a polymorphic method call is not connected to the best candidate callees, occurrences of a variable are not connected to each other to represent calculations, etc.


Now, there are many different technical approaches to syntactic analysis, most of them originating in compiler technology (based on formal grammar description, for instance). There are variations in the analysis depth as well, which strike a balance between precision and efficiency. Some syntax elements can be skipped, for instance (which are of no interest to the task at hand), or similar structures may be analyzed together. There are also language specific variations, which make this category very rich in different "shades of deep".


Properties: can be implemented to scale to large systems, relatively simple algorithms, well-established technologies and frameworks available. The handling of complex languages and language dialects, incomplete code, or complex systems require very robust, often complex and big analyzers, which need to be constantly evolved together with the language.


Good for: many different code analyses ranging from simple code style checks, through simpler defect detectors to other quality issue analyzers based on syntactic structures such as code metrics and bad smell detectors.


Limitations: The defect and quality issues that can be detected with syntactic analyzers are limited to those that don't require information about the semantics (meaning of the calculations) performed by the program. For example, reasoning about possible undesired values of variables (e.g. a null pointer) is not possible. This often results in defect detectors with a high false positive rate (in other words, the detected issues are merely warnings that need to be further investigated manually). Another typical issue with some analyzers in this category is the consequence of the imperfect syntactic analysis of certain language dialects or specific constructs (due to the difficulties mentioned above). This will result in imprecise or false outputs from the analyzer.



Semantic analysis


Semantic analyzers constitute a big crowd, and the border is often blurred between them and syntactic analyzers. Often, this is referred to as "deep" analysis, because it dives into the meaning of the syntax: the semantics of programs are somehow approximated by semantic analyzers. In other words, the meaning of the syntactic structures is recreated internally in some form or the other but without actually performing the calculations (we are in the static universe, remember?). Of course, complete understanding of the possible executions and the calculations in these executions is impossible statically. That's why we said that the semantics are approximated. And this is what makes this category so broad, namely the semantics can be approximated at many different levels. This starts somewhere at understanding more deeply the semantically more complex structures such as method calls. Here, we may reason about possible targets of a polymorphic call. Possible targets of pointers and references are another example. Bit more deep is when we consider control-flow and data-flow in the program. The former shows the possible order of executing the statements, while the latter captures the calculation chains between the variables of the program. Both carry very important information about the semantics of the program and can be used to implement very powerful detectors (for such as uninitialized variables, dead code, memory corruptions, etc).


Diving even deeper, there are algorithms that perform some forms of partial execution of the program (still statically!) based on the control and data-flow information. Examples include abstract interpretation, symbolic execution and model checking. These differ in their fundamental approaches to the problem (I will not detail it here, don't be afraid), and also the questions they can answer. For instance, a symbolic executor analyzes a program to determine what inputs cause each part of a program to execute, which can be very handy at finding defects such as null-pointer dereferences. A model checker, on the other hand, takes a model of the program (which is essentially its representation in some predefined form) and exhaustively checks whether this model meets a given specification (for instance, a variable must take a value from a certain interval).


Another classification of semantic analyzers is whether they are sound or unsound. The difference is that unsound analyzers can yield false negatives as well (i.e. falsely claiming for a piece of code that it does not contain a defect), while sound analyzers can guarantee the absence of such specific defects. Sound model checkers, for example, can be used to prove the absence of deadlocks in multi-threaded programs. Note, that whatever deep the static analysis, unfortunately, false positives cannot be completely eliminated because it is impossible to prove statically that there will be at least one execution of the program instantiating a potential defect.


Properties: very complex algorithms, with solid theoretical backgrounds. Typically, very precise analysis with no room for incomplete code or error tolerance. Consequently, the analyzers are typically very expensive (both resource-wise and money-wise).


Good for: semantic analyzers are much more precise than the more shallow ones, and are capable of detecting "really interesting" bugs, not just potentially-maybe defects. As the scale is wide from simple control and data-flow analyzers to model checkers, it must be investigated what is the purpose of the analysis, the level of criticality and how much budget is available. Don't forget that costs of deeper analysis will go exponentially higher in this region!


Limitations: scalability and scalability. Semantic analysis algorithms (especially the partial execution group and deeper) are typically very resource demanding. This is no surprise as they have to statically take into account a large number of possible executions of the program. Compared to the syntactic analyses, these tools often can handle systems of two to three magnitude smaller size (e.g. 10kLOC vs. 10MLOC max). The implementations usually employ some constraints and heuristic optimizations (such as the number of execution paths examined), but if you have to be on a very safe side (developing shuttle software for NASA, for example), you have to be prepared to trade off safety to scalability, or buy very expensive machines for the analysis.





There are other aspects of static analysis tools that may be linked to the depth of the analysis, which are not investigated in this essay. These include if the analysis is performed on a whole program or units/components only, and if multi-paradigm or multi-language analysis is available. These clearly influence the precision and thoroughness of the analysis, but in a different way and hence are a different story...


So, what is the take away message? First, the analysis depth is really a relative term: without the need for completely understand the underlying algorithms, you should be at least aware of the concrete type of the analysis, possibly according to the above scheme. Based on that you can place then your chosen tool within a depth rank with other tools. There are many different shades to deep analysis...


Second, deeper is not always better. Remember that there is an exponential growth of the expenses to achieve an arbitrary high quality level. Deeper analyses will produce more relevant defect reports with fewer missed ones (and hopefully fewer false positives as well), consequently this should be used for higher quality demands. But if your analysis is "two times deeper", you will have to pay "two-to-the-nth more". To put it simply, if you are developing a safety critical program and have the budget for it you should opt for deepest available analysis, otherwise dive a bit closer to the surface.


However, beware of very shallow analyses too, as they may give you a nightmare of false positives and imprecise results. Some tool vendors employ a strategy to market their tools as "very fast", "multi-language", "multi-purpose", "warn for all possible bugs". But you should check if this is not only an alias to a very shallow (lexical, or maybe simplified syntactic) analyzer, and hence this versatility and efficiency.


As always, a happy medium is to be found. Academic research has also shown that in some cases deeper analysis is not worth it: a more simpler and cheaper yet more scalable algorithm may be acceptable at the expense of loosing some of the precision, which is not essential to the application at hand.


The only way to the happy path is to try and understand as much as possible how deep your static analysis tool is, but not in terms of marketing speech but in terms of facts about the underlying technology. For that, I hope this article will give you some support.

required to comment.