I want to use PMD to find Common Subexpression of the form
z = a +b;
q = a +b;
for Java code.
I know the CPD tool exists for things like this but there I will also receive copied functions etc. that have matches in the file. I only want to find duplicate expressions that are connected with a Binary Operator.
When analyzing a small code example z = a + b
the following structure appears per BlockStatement:
└─ BlockStatement
└─ Statement
└─ StatementExpression
├─ PrimaryExpression
│ └─ PrimaryPrefix // z
│ └─ Name
├─ AssignmentOperator
└─ Expression
└─ AdditiveExpression // +
├─ PrimaryExpression
│ └─ PrimaryPrefix // a
│ └─ Name
└─ PrimaryExpression
└─ PrimaryPrefix // b
└─ Name
I know I somehow have to check if the count of an AdditiveExpression appears more than once in my method by I cannot get it working. One aspect where I lack understanding is, whether I need to compare all 3 parts of the expression or if I just can compare the inner node AdditiveExpression because from there on the duplicate expressions are the same.
Wrong approaches:
//AdditiveExpression[count(.) > 0]
//AdditiveExpression[.=../../../../BlockStatement/Statement/StatementExpression/Expression]/AdditiveExpression]
Edit 1: Adding the XML file for the following java Example:
public class Example{
public static void main(final String[] args){
int i = 5;
int k = 10;
int z,h;
z = i + k;
h = i + k;
}
}
<?xml version='1.0' encoding='UTF-8' ?>
<CompilationUnit Image='' PackageName='' declarationsAreInDefaultPackage='true'>
<TypeDeclaration Image=''>
<ClassOrInterfaceDeclaration Abstract='false' BinaryName='Example' Default='false' Final='false' Image='Example' Interface='false' Local='false' Modifiers='1' Native='false' Nested='false' NonSealed='false' PackagePrivate='false' Private='false' Protected='false' Public='true' Sealed='false' SimpleName='Example' Static='false' Strictfp='false' Synchronized='false' Transient='false' TypeKind='CLASS' Volatile='false'>
<ClassOrInterfaceBody AnonymousInnerClass='false' EnumChild='false' Image=''>
<ClassOrInterfaceBodyDeclaration AnonymousInnerClass='false' EnumChild='false' Image='' Kind='METHOD'>
<MethodDeclaration Abstract='false' Arity='1' Default='false' Final='false' Image='' InterfaceMember='false' Kind='METHOD' MethodName='main' Modifiers='17' Name='main' Native='false' PackagePrivate='false' Private='false' Protected='false' Public='true' Static='true' Strictfp='false' Synchronized='false' SyntacticallyAbstract='false' SyntacticallyPublic='true' Transient='false' Void='true' Volatile='false'>
<ResultType Image='' Void='true' returnsArray='false' />
<MethodDeclarator Image='main' ParameterCount='1'>
<FormalParameters Image='' ParameterCount='1' Size='1'>
<FormalParameter Abstract='false' Array='true' ArrayDepth='1' Default='false' ExplicitReceiverParameter='false' Final='true' Image='' Modifiers='32' Native='false' PackagePrivate='true' Private='false' Protected='false' Public='false' Static='false' Strictfp='false' Synchronized='false' Transient='false' TypeInferred='false' Varargs='false' Volatile='false'>
<Type Array='true' ArrayDepth='1' ArrayType='true' Image='' TypeImage='String'>
<ReferenceType Array='true' ArrayDepth='1' Image=''>
<ClassOrInterfaceType AnonymousClass='false' Array='true' ArrayDepth='1' Image='String' ReferenceToClassSameCompilationUnit='false' />
</ReferenceType>
</Type>
<VariableDeclaratorId Array='false' ArrayDepth='0' ArrayType='true' ExceptionBlockParameter='false' ExplicitReceiverParameter='false' Field='false' Final='true' ForeachVariable='false' FormalParameter='true' Image='args' LambdaParameter='false' LocalVariable='false' Name='args' PatternBinding='false' ResourceDeclaration='false' TypeInferred='false' VariableName='args' />
</FormalParameter>
</FormalParameters>
</MethodDeclarator>
<Block Image='' containsComment='false'>
<BlockStatement Allocation='false' Image=''>
<LocalVariableDeclaration Abstract='false' Array='false' ArrayDepth='0' Default='false' Final='false' Image='' Modifiers='0' Native='false' PackagePrivate='true' Private='false' Protected='false' Public='false' Static='false' Strictfp='false' Synchronized='false' Transient='false' TypeInferred='false' VariableName='i' Volatile='false'>
<Type Array='false' ArrayDepth='0' ArrayType='false' Image='' TypeImage='int'>
<PrimitiveType Array='false' ArrayDepth='0' Boolean='false' Image='int' />
</Type>
<VariableDeclarator Image='' Initializer='true' Name='i'>
<VariableDeclaratorId Array='false' ArrayDepth='0' ArrayType='false' ExceptionBlockParameter='false' ExplicitReceiverParameter='false' Field='false' Final='false' ForeachVariable='false' FormalParameter='false' Image='i' LambdaParameter='false' LocalVariable='true' Name='i' PatternBinding='false' ResourceDeclaration='false' TypeInferred='false' VariableName='i' />
<VariableInitializer Image=''>
<Expression Image='' StandAlonePrimitive='true'>
<PrimaryExpression Image=''>
<PrimaryPrefix Image='' SuperModifier='false' ThisModifier='false'>
<Literal CharLiteral='false' DoubleLiteral='false' EscapedStringLiteral='5' FloatLiteral='false' Image='5' IntLiteral='true' LongLiteral='false' SingleCharacterStringLiteral='false' StringLiteral='false' TextBlock='false' TextBlockContent='5' ValueAsDouble='NaN' ValueAsFloat='NaN' ValueAsInt='5' ValueAsLong='5' />
</PrimaryPrefix>
</PrimaryExpression>
</Expression>
</VariableInitializer>
</VariableDeclarator>
</LocalVariableDeclaration>
</BlockStatement>
<BlockStatement Allocation='false' Image=''>
<LocalVariableDeclaration Abstract='false' Array='false' ArrayDepth='0' Default='false' Final='false' Image='' Modifiers='0' Native='false' PackagePrivate='true' Private='false' Protected='false' Public='false' Static='false' Strictfp='false' Synchronized='false' Transient='false' TypeInferred='false' VariableName='k' Volatile='false'>
<Type Array='false' ArrayDepth='0' ArrayType='false' Image='' TypeImage='int'>
<PrimitiveType Array='false' ArrayDepth='0' Boolean='false' Image='int' />
</Type>
<VariableDeclarator Image='' Initializer='true' Name='k'>
<VariableDeclaratorId Array='false' ArrayDepth='0' ArrayType='false' ExceptionBlockParameter='false' ExplicitReceiverParameter='false' Field='false' Final='false' ForeachVariable='false' FormalParameter='false' Image='k' LambdaParameter='false' LocalVariable='true' Name='k' PatternBinding='false' ResourceDeclaration='false' TypeInferred='false' VariableName='k' />
<VariableInitializer Image=''>
<Expression Image='' StandAlonePrimitive='true'>
<PrimaryExpression Image=''>
<PrimaryPrefix Image='' SuperModifier='false' ThisModifier='false'>
<Literal CharLiteral='false' DoubleLiteral='false' EscapedStringLiteral='10' FloatLiteral='false' Image='10' IntLiteral='true' LongLiteral='false' SingleCharacterStringLiteral='false' StringLiteral='false' TextBlock='false' TextBlockContent='10' ValueAsDouble='NaN' ValueAsFloat='NaN' ValueAsInt='10' ValueAsLong='10' />
</PrimaryPrefix>
</PrimaryExpression>
</Expression>
</VariableInitializer>
</VariableDeclarator>
</LocalVariableDeclaration>
</BlockStatement>
<BlockStatement Allocation='false' Image=''>
<LocalVariableDeclaration Abstract='false' Array='false' ArrayDepth='0' Default='false' Final='false' Image='' Modifiers='0' Native='false' PackagePrivate='true' Private='false' Protected='false' Public='false' Static='false' Strictfp='false' Synchronized='false' Transient='false' TypeInferred='false' VariableName='z' Volatile='false'>
<Type Array='false' ArrayDepth='0' ArrayType='false' Image='' TypeImage='int'>
<PrimitiveType Array='false' ArrayDepth='0' Boolean='false' Image='int' />
</Type>
<VariableDeclarator Image='' Initializer='false' Name='z'>
<VariableDeclaratorId Array='false' ArrayDepth='0' ArrayType='false' ExceptionBlockParameter='false' ExplicitReceiverParameter='false' Field='false' Final='false' ForeachVariable='false' FormalParameter='false' Image='z' LambdaParameter='false' LocalVariable='true' Name='z' PatternBinding='false' ResourceDeclaration='false' TypeInferred='false' VariableName='z' />
</VariableDeclarator>
<VariableDeclarator Image='' Initializer='false' Name='h'>
<VariableDeclaratorId Array='false' ArrayDepth='0' ArrayType='false' ExceptionBlockParameter='false' ExplicitReceiverParameter='false' Field='false' Final='false' ForeachVariable='false' FormalParameter='false' Image='h' LambdaParameter='false' LocalVariable='true' Name='h' PatternBinding='false' ResourceDeclaration='false' TypeInferred='false' VariableName='h' />
</VariableDeclarator>
</LocalVariableDeclaration>
</BlockStatement>
<BlockStatement Allocation='false' Image=''>
<Statement Image=''>
<StatementExpression Image=''>
<PrimaryExpression Image=''>
<PrimaryPrefix Image='' SuperModifier='false' ThisModifier='false'>
<Name Image='z' />
</PrimaryPrefix>
</PrimaryExpression>
<AssignmentOperator Compound='false' Image='=' />
<Expression Image='' StandAlonePrimitive='false'>
<AdditiveExpression Image='+' Operator='+'>
<PrimaryExpression Image=''>
<PrimaryPrefix Image='' SuperModifier='false' ThisModifier='false'>
<Name Image='i' />
</PrimaryPrefix>
</PrimaryExpression>
<PrimaryExpression Image=''>
<PrimaryPrefix Image='' SuperModifier='false' ThisModifier='false'>
<Name Image='k' />
</PrimaryPrefix>
</PrimaryExpression>
</AdditiveExpression>
</Expression>
</StatementExpression>
</Statement>
</BlockStatement>
<BlockStatement Allocation='false' Image=''>
<Statement Image=''>
<StatementExpression Image=''>
<PrimaryExpression Image=''>
<PrimaryPrefix Image='' SuperModifier='false' ThisModifier='false'>
<Name Image='h' />
</PrimaryPrefix>
</PrimaryExpression>
<AssignmentOperator Compound='false' Image='=' />
<Expression Image='' StandAlonePrimitive='false'>
<AdditiveExpression Image='+' Operator='+'>
<PrimaryExpression Image=''>
<PrimaryPrefix Image='' SuperModifier='false' ThisModifier='false'>
<Name Image='i' />
</PrimaryPrefix>
</PrimaryExpression>
<PrimaryExpression Image=''>
<PrimaryPrefix Image='' SuperModifier='false' ThisModifier='false'>
<Name Image='k' />
</PrimaryPrefix>
</PrimaryExpression>
</AdditiveExpression>
</Expression>
</StatementExpression>
</Statement>
</BlockStatement>
</Block>
</MethodDeclaration>
</ClassOrInterfaceBodyDeclaration>
</ClassOrInterfaceBody>
</ClassOrInterfaceDeclaration>
</TypeDeclaration>
</CompilationUnit>
In XPath 2 you don't have the handy deep-equal
function available in version 3 which would allow you to compare two AdditiveExpression
elements and all their descendants in a single function call. This lack means you have to write something much more verbose, but you could probably get away with something like this:
for $e1 in //AdditiveExpression return
$e1[
some $e2 in (//AdditiveExpression except $e1) satisfies
$e1/@Image = $e2/@Image and
$e1/PrimaryExpression[1]/PrimaryPrefix/Name/@Image =
$e2/PrimaryExpression[1]/PrimaryPrefix/Name/@Image and
$e1/PrimaryExpression[2]/PrimaryPrefix/Name/@Image =
$e2/PrimaryExpression[2]/PrimaryPrefix/Name/@Image
]
This XPath 2 expression constructs a sequence of elements that are considered to have duplicates like so:
For every expression ('$e1
') in the set of AdditiiveExpression
elements, return that $e1
if the criterion in the predicate ([
... ]
) evaluates to true
, namely, that there's some other expression ('$e2
') which is also one of the AdditiveExpression
elements but isn't $e1
, and which has the same Image
attribute as the $e1
expression, and whose child elements have Image
attributes that match the corresponding child elements of $e1
.
I'm not sure what are the important criteria for your two expressions to be considered duplicates; I didn't compare literally every attribute etc, but this should give you an idea.