I'm currently working on a Java project where it's part of my job to watch over the quality. As tools I use Jenkins in combination with Sonar. These tools are great and the helped me to track issues fast and continuously.
One issue I don't get under control is that some people commit using other encoding than UTF-8.
When code like this:
if (someString == "something") {
resultString = "string with encoding problem: �";
}
... gets committed, Sonar will help me finding the "String Literal Equality" issue. But as you see in the second line there is an issue with the encoding: "�" should usually be an "ü".
Is there any possibility to find these kinds of problems with Sonar/Findbugs/PMD...
Please advice! Thank you.
Ps: Of course I've tried to explain the issue to my co-developers in person as well as via email. I even changed their project/workspace encoding myself... But somehow the still succeed in committing code like this.
I'm agree with @bmargulies, it's a valid UTF-8 char (actually it's the replacement character) but after all, a PMD rule could help. Here is a proof of concept rule with a hard-coded unallowed character list:
import net.sourceforge.pmd.AbstractJavaRule;
import net.sourceforge.pmd.ast.ASTLiteral;
import org.apache.commons.lang3.StringUtils;
public class EncodingRule extends AbstractJavaRule {
private static final String badChars = "\uFFFD";
public EncodingRule() {
}
@Override
public Object visit(final ASTLiteral node, final Object data) {
if (node.isStringLiteral()) {
final String image = node.getImage();
if (StringUtils.containsAny(image, badChars)) {
addViolationWithMessage(data, node, "Disallowed char in '"
+ image + "'");
}
}
return super.visit(node, data);
}
}
Maybe it would be useful to invert the condition and make an allowedChars
whitelist with ASCII characters and your local chars as well. (There is some more detail of custom PMD rules in this answer.)