Search code examples
duplicatessonarqubemetricscode-duplicationcode-metrics

Code duplication metrics - Best practice


When looking at code duplication metrics over a long period of time (>10 years) are there guidelines / best practices for what level of code duplication is "normal" or "recommended"?

I have great difficulty with this question as if the code quality was great then nobody needs to maintain it so who cares? But, in general terms, are there references on "normal". Say for 10 lines of duplication threshold.

Is a, say X%, duplication unusual or normal? If normal does that mean that there are healthy profitable projects our there with this level of duplication.

Perhaps the answer is if there is a study that includes code duplication as one of the metrics against success / average / failure? Perhaps people can share their success experience in maintenance costs for a level of code duplication?


Solution

  • In my opinion there is no general answer to this.

    You should inspect every finding of the tool and decide if it is a false positive or justified (may be some standard coding pattern or generated code parts) or does contain problematic copy pasted code that should be refactored and extracted into a own function/module/whatever.

    In case of false positive or intended code there should be a tool specific way to suppress the warning on this occurrence or on a general base on a specific pattern.

    So in future runs, you always should get warnings if new duplicates are found (or older unfixed true positives).

    May be on false positives consider filing a bug report to the author (crafting the smallest possible code-configuration that still throws that warning).