10 Code Smells a Static Analyser Can Locate in a Codebase
Today’s guest post is written by Issam Lahlali. Issam is the CppDepend lead developer. He enjoy writing C++ code every day and he wrote many posts about the C++ code quality.
Static analysis is not only about directly finding bugs, but also about finding bug-prone situations that can decrease code understanding and maintainability. Static analysis can handle many other properties of the code:
- Code metrics: for example, methods with too many loops, if, else, switch, case… end up being non-understandable, hence non-maintainable. Counting these through the code metric Cyclomatic Complexity is a great way to assess when a method becomes too complex.
- Dependencies: if the classes of your program are entangled, effects of any changes in the code becomes unpredictable. Static analysis can help to assess when classes and components are entangled.
- Immutability: types that are used concurrently by several threads should be immutable, else you’ll have to protect state read/write access with complex lock strategies that will end up being un-maintainable. Static analysis can make sure that some classes remain immutable.
- Dead code: dead code is code that can be removed safely, because it is not invoked anymore at runtime. Not only can it be removed, but it should be removed, because this extra code add unnecessary complexity to the program. Static analysis can find a lot of the dead code in your program (yet not all).
- API breaking change: if you present an API to your client, it can be easy to remove a public member without noticing and thus, breaking your clients code. Static analysis can compare two versions of the code and can warn about this pitfall.
A code smell can be also considered as a bug-prone situation. Let’s see how a static analyser can detect code smells for you.
⚠️ Spoiler alert: if you read this article until the end, you will find a coupon that will give you a 15% discount on the latest version of CppDepend.
Code smells
Here is the definition of a code smell from Wikipedia:
In computer programming, code smell, (or bad smell) is any symptom in the source code of a program that possibly indicates a deeper problem. According to Martin Fowler, “a code smell is a surface indication that usually corresponds to a deeper problem in the system”. Another way to look at smells is with respect to principles and quality: “smells are certain structures in the code that indicate violation of fundamental design principles and negatively impact design quality”.
Code smells are usually not bugs—they are not technically incorrect and do not currently prevent the program from functioning. Instead, they indicate weaknesses in design that may be slowing down development or increasing the risk of bugs or failures in the future. Bad code smells can be an indicator of factors that contribute to technical debt. Robert C. Martin calls a list of code smells a “value system” for software craftsmanship.
Many interesting tools exist to detect bugs in your C++ code base like cppcheck, clang-tidy and visual studio analyzer. But what about the detection of the bug-prone situations?
If the static analysis tools creators could decide which situations are considered as bugs, it’s not the case of the code smells cases which depend on the development team choices. For example a team could consider that a method with more than 20 lines is a code smell, another team could set its limit to 30. If a tool provides the detection of the code smells, it must provides also the possibility to customize it.
Code as Data to detect code smells
Static analysis is the idea of analyzing source code for various properties and reporting on those properties, but it’s also, more generally, the idea of treating code as data.
This can sound strange to us as application developers, since we’re very much used to thinking of source code as instructions, procedures, and algorithms. But it’s also deeply powerful.
The idea is to analyse the source code in a file, extract its AST and generate a model containing a wealth of relevant data about the code. This way we can query it using a code query language similar to SQL.
CppDepend provides a code query language named CQLinq to query the code base like a database. Developers, designers and architects could define their custom queries to find easily the bug-prone situations.
With CQlinq we can combine the data from the code metrics, dependencies, API usage and other model data to define elaborate queries that match some bug-prone situations.
Here’s an example of a CQLinq query that allows to identify the most complex methods:
Let’s explore 10 common code smells and how CQLinq allows to detect them:
1-Too big types
Types implementations spreading across too many lines are a burden to maintain. If you consider a reasonable limit to be say 200 lines, you can locate the types that go over that limit with the formula NbLinesOfCode > 200
:
Here are a few refactoring tips:
- The goal is to split the class in smaller classes. These smaller classes can be external classes or private classes nested in the original class, whose instances objects become composed of instances of smaller classes.
- The partitioning to smaller classes should be driven by the multiple responsibilities handled by the class. To identify these responsibilities it often helps to look for subsets of methods strongly coupled with subsets of fields.
- If the class contains way more logic than states, a good option can be to define one or several free functions.
- Try to maintain the interface of the class at first and delegate calls to the new extracted classes. In the end, the class should be a pure facade without its own logic. Then you can keep it for convenience or throw it away and start to use the new classes only.
- Unit Tests can help: write tests for each method before extracting it to ensure you don’t break functionality.
2-Types with too many methods
Another metrics for type complexity is the number of methods. Having many methods for a type might be a sign of too many responsibilities implemented.
Here’s the corresponding CQLinq query to detect them:
3-Types with too many data members
Like with a large number of methods, a large number of data members can be a sign of the type having more responsibilities than it should.
Here is a query to detect such types with a large number of data members:
4-Long methods
Methods with many numbers of lines of code are not easy to maintain and understand. Here is how to identify the methods that are, say 60 lines long:
The above query was performed on the Unreal Engine source code. The whole codebase contains more than 150 000 methods, so less than 1% could be considered as too big (if our limit is 60 lines).
5-Methods taking many parameters
Methods with too many parameters are difficult to understand, because as humans we have a hard time keeping track of more than a handful of objects at the same time.
Here’s the CQLinq query to detect methods that have more than a certain number of parameters, for example 7:
This query was launched on the Unreal Engine source code, and we can see that about 0.5% of the methods in the code base have 8 or more parameters. And most of them are generic, emulating variadic functions, like the case of TCStringt::Snprintf
methods in the above screenshot.
6-Methods with many local variables
The more local variables, the more things you have to follow to understand the body of the function.
Here’s the query to detect methods with more than a set number of variables (here 20):
Less than 1% of the Unreal Engine methods have more than 20 local variables.
7-Too complex methods
There are other interesting metrics to detect complex functions:
- Cyclomatic complexity is a popular procedural software metric equal to the number of branching points in a procedure or, as its wikipedia article puts it, “the number of linearly independent paths through a program’s source code”.
- Nesting Depth is the depth of the most nested scope in a method body.
- Max Nested loop isthe maximum level of loop nesting in a function.
The max value tolerated for these metrics depends on the team choices, there are no real standard values.
Let’s search for methods that could be considered as complex with regard to cyclomatic complexity, nesting depth and max nested loop in the Unreal Engine code base:
Only 1,5% of the Unreal Engine methods are deemed as too complex by this measurement, and could be good candidate to be refactored to minimize their complexity.
8- Methods with too many overloads
Typically the “too many overloads” phenomenon appears when an algorithm takes various sets of in-parameters. Each overload is presented as a facility to provide a set of in-parameters.
Having a few overloads can be handy, but past a certain number the interface can become confusing.
The “too many overloads” phenomenon can also be a consequence of the usage of the visitor design pattern since a method named Visit()
must be provided for each subtype. In such a situation, there is no need for a fix.
Here is what a query to locate methods with more than 6 overloads would look like in CQLinq:
9-Coupling
Low coupling is desirable because a change in one area of an application will require fewer changes throughout the entire application. In the long run, low coupling saves a lot of time, effort, and cost associated with modifying and adding new features to an application.
C++ offers several tools to reduce coupling by using polymorphism. For example, abstract classes (in the sense of a class with at least one pure virtual method) or generic (template) types and methods.
Let’s search for all abstract classes defined in the Unreal Engine source code:
Only some few types are declared as abstract. The low coupling is more enforced by using generic types and generic methods.
Here’s for example the methods using at least one generic method:
As we can observe many methods use the generic ones, the low coupling is enforced by the function template params.
10-Cohesion
As Robert Martin puts it in Agile Software Development, Principles, Patterns, and Practices, the single responsibility principle states that “A class should have only one reason to change”. Such a class is said to be cohesive: all of its members contribute to that responsibility.
To measure the cohesion of a class, we can use the LCOM as a quantitative indicator. LCOM stands for Lack of Cohesion of Methods, so high LCOM value pinpoints a poorly cohesive class.
There are several metrics for class cohesion. The LCOM takes its values in the range [0-1]. Here is its formula:
LCOM = 1 – (sum(MF)/M*F)
The LCOM HS (HS standing for Henderson-Sellers) is a variation of the LCOM takes its values in the range [0-2]. A LCOM HS value higher than 1 should be considered alarming. Here are to compute LCOM metrics:
LCOM HS = (M – sum(MF)/F)(M-1)
Where:
- M is the number of methods in class (both static and instance methods are counted, as well as constructors and properties getters/setters).
- F is the number of instance fields in the class.
- MF is the number of methods of the class accessing a particular instance field.
- Sum(MF) is the sum of MF over all instance fields of the class.
The underlying idea behind these formulas can be stated as follow: a class is completely cohesive if all its methods use all its methods use all its instance fields, which means that sum(MF)=M*F and then LCOM = 0 and LCOMHS = 0.
Given that LCOM HS values higher than 1 should be considered alarming, let’s measure the LCOM HS of the classes having more than 10 data members and 10 member functions in the Unreal Engine source code:
Only few types are considered to be big and not cohesive.
Try it on your code
All the above queries were run on the Unreal Engine codebase, but are by no means specific to it. Chances are a lot of them also apply to your code. They will help you locate the hot spots and by fixing them, improve the quality and expressiveness of your code.
If you’d like to have a go, you can check out the CppDepend tool. And as promised, as a reader of Fluent C++ you get a 15% discount on the latest version of CppDepend. To get it, just use the coupon FluentCpp at checkout! This coupon is valid until the end of April.
You will also like
- To DRY or not to DRY?
- Functional Programming Is Not a Silver Bullet
- TODO_BEFORE(): A Cleaner Codebase for 2019
Share this post!