Saturday, June 13, 2015

Understanding of Cyclomatic Complexity for measuring code quality

But does that mean a program with a high Cyclomatic Complexity (CC) has a bad quality ? For sure not !

It is just another metrics and some people think it is better to use Line numbers instead as more simple and more robust metric. Very good paper  - (page 17) , see also Conclusion - page 45.

It is common when people say " ... reducing the cyclomatic complexity of the code can help reducing the number of test cases ... " - that is not always true!!
Blind following of Cyclomatic metric will lead you to code that have a lot of small methods that will be hard to maintain as encapsulation might be damaged. It is not a rare case there you should put ignore to some case where complexity is high as you defined for the rest of the project.
To keep all that logic(even a bit complicated) in one method to not let other methods use it and what is more problematic change them to their needs. Not everything should be decomposed to smaller parts.
Finally if your complicated method is "public" and you will decompose it internals for several "private" method - you will not reduce amount of tests !!! as in tests you will still check your public method with all possible values in arguments.

The main point of CC is to measure complexity to measure amount of tests, so it suppose to be used as (from wiki) :
 branch coverage \leq cyclomatic complexity \leq number of paths

from wiki:
McCabe showed that the cyclomatic complexity of any structured program with only one entrance point and one exit point is equal to the number of decision points (π) (i.e., "if" statements or conditional loops) contained in that program plus one. However, this is true only for decision points counted at the lowest, machine-level instructions. Decisions involving compound predicates like those found in high-level languages like IF cond1 AND cond2 THEN ... should be counted in terms of predicate variables involved, i.e. in this examples one should count two decision points, because at machine level it is equivalent to  IF cond1 THEN IF cond2 THEN ... 
Cyclomatic complexity may be extended to a program with multiple exit points; in this case it is equal to:
π − s + 2, 
where π is the number of decision points in the program, and s is the number of exit points

So amount of decision point is good measure of complexity ,  but as you can see in definition it could be reduced by adding more exit points. That is true for amount of test calculation  but that is not true to measure of code quality and readability.
So "multi exits/returns" could be a incentive for engineers to calm down Checkstyle or PMD . But that conflict with best practice to have one entry and one exit point in method.

Conclusion: Cyclomatic Complexity is for measuring of Tests amount and NOT about code quality. But CC could be used as quality measure with custom to project level/threshold selection. Do not be ashamed to have complexity level 15 or even more, but keep it below 20 to catch really bad designed code automatically.

I highly recommend to read to see on code examples how weird complexity calculation could be.

Checkstyle's HTML documentation was updated for this metric:

I updated Chekstyle code to follow level less then 11, I can confirm that about 54 cases were real ugly and messy code and it  was good that that metrics is pointed to that code. But in non library code level 11 is too demanding but reachable, 15 is ok.

Example of Cyclomatic complexity calculation over code base of Checkstyle, Spring, OpenJDK.

Sources: (especially links to external and other ref)

No comments:

Post a Comment