Search code examples
javaxpathpmd

Using XPath in PMD to find duplicate expressions


I want to use PMD to find Common Subexpression of the form

z = a +b;
q = a +b;

for Java code.

I know the CPD tool exists for things like this but there I will also receive copied functions etc. that have matches in the file. I only want to find duplicate expressions that are connected with a Binary Operator.

When analyzing a small code example z = a + b the following structure appears per BlockStatement:

└─ BlockStatement
   └─ Statement
      └─ StatementExpression
         ├─ PrimaryExpression
         │  └─ PrimaryPrefix        //   z 
         │     └─ Name
         ├─ AssignmentOperator
         └─ Expression
            └─ AdditiveExpression   //   +
               ├─ PrimaryExpression
               │  └─ PrimaryPrefix  //   a 
               │     └─ Name
               └─ PrimaryExpression
                  └─ PrimaryPrefix  //   b
                     └─ Name

I know I somehow have to check if the count of an AdditiveExpression appears more than once in my method by I cannot get it working. One aspect where I lack understanding is, whether I need to compare all 3 parts of the expression or if I just can compare the inner node AdditiveExpression because from there on the duplicate expressions are the same.

Wrong approaches:

//AdditiveExpression[count(.) > 0]

//AdditiveExpression[.=../../../../BlockStatement/Statement/StatementExpression/Expression]/AdditiveExpression]

Edit 1: Adding the XML file for the following java Example:

public class Example{
    public static void main(final String[] args){
    int i = 5;
    int k = 10;
    int z,h;
    z = i + k;
    h = i + k;
    }
}
<?xml version='1.0' encoding='UTF-8' ?>
<CompilationUnit Image='' PackageName='' declarationsAreInDefaultPackage='true'>
    <TypeDeclaration Image=''>
        <ClassOrInterfaceDeclaration Abstract='false' BinaryName='Example' Default='false' Final='false' Image='Example' Interface='false' Local='false' Modifiers='1' Native='false' Nested='false' NonSealed='false' PackagePrivate='false' Private='false' Protected='false' Public='true' Sealed='false' SimpleName='Example' Static='false' Strictfp='false' Synchronized='false' Transient='false' TypeKind='CLASS' Volatile='false'>
            <ClassOrInterfaceBody AnonymousInnerClass='false' EnumChild='false' Image=''>
                <ClassOrInterfaceBodyDeclaration AnonymousInnerClass='false' EnumChild='false' Image='' Kind='METHOD'>
                    <MethodDeclaration Abstract='false' Arity='1' Default='false' Final='false' Image='' InterfaceMember='false' Kind='METHOD' MethodName='main' Modifiers='17' Name='main' Native='false' PackagePrivate='false' Private='false' Protected='false' Public='true' Static='true' Strictfp='false' Synchronized='false' SyntacticallyAbstract='false' SyntacticallyPublic='true' Transient='false' Void='true' Volatile='false'>
                        <ResultType Image='' Void='true' returnsArray='false' />
                        <MethodDeclarator Image='main' ParameterCount='1'>
                            <FormalParameters Image='' ParameterCount='1' Size='1'>
                                <FormalParameter Abstract='false' Array='true' ArrayDepth='1' Default='false' ExplicitReceiverParameter='false' Final='true' Image='' Modifiers='32' Native='false' PackagePrivate='true' Private='false' Protected='false' Public='false' Static='false' Strictfp='false' Synchronized='false' Transient='false' TypeInferred='false' Varargs='false' Volatile='false'>
                                    <Type Array='true' ArrayDepth='1' ArrayType='true' Image='' TypeImage='String'>
                                        <ReferenceType Array='true' ArrayDepth='1' Image=''>
                                            <ClassOrInterfaceType AnonymousClass='false' Array='true' ArrayDepth='1' Image='String' ReferenceToClassSameCompilationUnit='false' />
                                        </ReferenceType>
                                    </Type>
                                    <VariableDeclaratorId Array='false' ArrayDepth='0' ArrayType='true' ExceptionBlockParameter='false' ExplicitReceiverParameter='false' Field='false' Final='true' ForeachVariable='false' FormalParameter='true' Image='args' LambdaParameter='false' LocalVariable='false' Name='args' PatternBinding='false' ResourceDeclaration='false' TypeInferred='false' VariableName='args' />
                                </FormalParameter>
                            </FormalParameters>
                        </MethodDeclarator>
                        <Block Image='' containsComment='false'>
                            <BlockStatement Allocation='false' Image=''>
                                <LocalVariableDeclaration Abstract='false' Array='false' ArrayDepth='0' Default='false' Final='false' Image='' Modifiers='0' Native='false' PackagePrivate='true' Private='false' Protected='false' Public='false' Static='false' Strictfp='false' Synchronized='false' Transient='false' TypeInferred='false' VariableName='i' Volatile='false'>
                                    <Type Array='false' ArrayDepth='0' ArrayType='false' Image='' TypeImage='int'>
                                        <PrimitiveType Array='false' ArrayDepth='0' Boolean='false' Image='int' />
                                    </Type>
                                    <VariableDeclarator Image='' Initializer='true' Name='i'>
                                        <VariableDeclaratorId Array='false' ArrayDepth='0' ArrayType='false' ExceptionBlockParameter='false' ExplicitReceiverParameter='false' Field='false' Final='false' ForeachVariable='false' FormalParameter='false' Image='i' LambdaParameter='false' LocalVariable='true' Name='i' PatternBinding='false' ResourceDeclaration='false' TypeInferred='false' VariableName='i' />
                                        <VariableInitializer Image=''>
                                            <Expression Image='' StandAlonePrimitive='true'>
                                                <PrimaryExpression Image=''>
                                                    <PrimaryPrefix Image='' SuperModifier='false' ThisModifier='false'>
                                                        <Literal CharLiteral='false' DoubleLiteral='false' EscapedStringLiteral='5' FloatLiteral='false' Image='5' IntLiteral='true' LongLiteral='false' SingleCharacterStringLiteral='false' StringLiteral='false' TextBlock='false' TextBlockContent='5' ValueAsDouble='NaN' ValueAsFloat='NaN' ValueAsInt='5' ValueAsLong='5' />
                                                    </PrimaryPrefix>
                                                </PrimaryExpression>
                                            </Expression>
                                        </VariableInitializer>
                                    </VariableDeclarator>
                                </LocalVariableDeclaration>
                            </BlockStatement>
                            <BlockStatement Allocation='false' Image=''>
                                <LocalVariableDeclaration Abstract='false' Array='false' ArrayDepth='0' Default='false' Final='false' Image='' Modifiers='0' Native='false' PackagePrivate='true' Private='false' Protected='false' Public='false' Static='false' Strictfp='false' Synchronized='false' Transient='false' TypeInferred='false' VariableName='k' Volatile='false'>
                                    <Type Array='false' ArrayDepth='0' ArrayType='false' Image='' TypeImage='int'>
                                        <PrimitiveType Array='false' ArrayDepth='0' Boolean='false' Image='int' />
                                    </Type>
                                    <VariableDeclarator Image='' Initializer='true' Name='k'>
                                        <VariableDeclaratorId Array='false' ArrayDepth='0' ArrayType='false' ExceptionBlockParameter='false' ExplicitReceiverParameter='false' Field='false' Final='false' ForeachVariable='false' FormalParameter='false' Image='k' LambdaParameter='false' LocalVariable='true' Name='k' PatternBinding='false' ResourceDeclaration='false' TypeInferred='false' VariableName='k' />
                                        <VariableInitializer Image=''>
                                            <Expression Image='' StandAlonePrimitive='true'>
                                                <PrimaryExpression Image=''>
                                                    <PrimaryPrefix Image='' SuperModifier='false' ThisModifier='false'>
                                                        <Literal CharLiteral='false' DoubleLiteral='false' EscapedStringLiteral='10' FloatLiteral='false' Image='10' IntLiteral='true' LongLiteral='false' SingleCharacterStringLiteral='false' StringLiteral='false' TextBlock='false' TextBlockContent='10' ValueAsDouble='NaN' ValueAsFloat='NaN' ValueAsInt='10' ValueAsLong='10' />
                                                    </PrimaryPrefix>
                                                </PrimaryExpression>
                                            </Expression>
                                        </VariableInitializer>
                                    </VariableDeclarator>
                                </LocalVariableDeclaration>
                            </BlockStatement>
                            <BlockStatement Allocation='false' Image=''>
                                <LocalVariableDeclaration Abstract='false' Array='false' ArrayDepth='0' Default='false' Final='false' Image='' Modifiers='0' Native='false' PackagePrivate='true' Private='false' Protected='false' Public='false' Static='false' Strictfp='false' Synchronized='false' Transient='false' TypeInferred='false' VariableName='z' Volatile='false'>
                                    <Type Array='false' ArrayDepth='0' ArrayType='false' Image='' TypeImage='int'>
                                        <PrimitiveType Array='false' ArrayDepth='0' Boolean='false' Image='int' />
                                    </Type>
                                    <VariableDeclarator Image='' Initializer='false' Name='z'>
                                        <VariableDeclaratorId Array='false' ArrayDepth='0' ArrayType='false' ExceptionBlockParameter='false' ExplicitReceiverParameter='false' Field='false' Final='false' ForeachVariable='false' FormalParameter='false' Image='z' LambdaParameter='false' LocalVariable='true' Name='z' PatternBinding='false' ResourceDeclaration='false' TypeInferred='false' VariableName='z' />
                                    </VariableDeclarator>
                                    <VariableDeclarator Image='' Initializer='false' Name='h'>
                                        <VariableDeclaratorId Array='false' ArrayDepth='0' ArrayType='false' ExceptionBlockParameter='false' ExplicitReceiverParameter='false' Field='false' Final='false' ForeachVariable='false' FormalParameter='false' Image='h' LambdaParameter='false' LocalVariable='true' Name='h' PatternBinding='false' ResourceDeclaration='false' TypeInferred='false' VariableName='h' />
                                    </VariableDeclarator>
                                </LocalVariableDeclaration>
                            </BlockStatement>
                            <BlockStatement Allocation='false' Image=''>
                                <Statement Image=''>
                                    <StatementExpression Image=''>
                                        <PrimaryExpression Image=''>
                                            <PrimaryPrefix Image='' SuperModifier='false' ThisModifier='false'>
                                                <Name Image='z' />
                                            </PrimaryPrefix>
                                        </PrimaryExpression>
                                        <AssignmentOperator Compound='false' Image='=' />
                                        <Expression Image='' StandAlonePrimitive='false'>
                                            <AdditiveExpression Image='+' Operator='+'>
                                                <PrimaryExpression Image=''>
                                                    <PrimaryPrefix Image='' SuperModifier='false' ThisModifier='false'>
                                                        <Name Image='i' />
                                                    </PrimaryPrefix>
                                                </PrimaryExpression>
                                                <PrimaryExpression Image=''>
                                                    <PrimaryPrefix Image='' SuperModifier='false' ThisModifier='false'>
                                                        <Name Image='k' />
                                                    </PrimaryPrefix>
                                                </PrimaryExpression>
                                            </AdditiveExpression>
                                        </Expression>
                                    </StatementExpression>
                                </Statement>
                            </BlockStatement>
                            <BlockStatement Allocation='false' Image=''>
                                <Statement Image=''>
                                    <StatementExpression Image=''>
                                        <PrimaryExpression Image=''>
                                            <PrimaryPrefix Image='' SuperModifier='false' ThisModifier='false'>
                                                <Name Image='h' />
                                            </PrimaryPrefix>
                                        </PrimaryExpression>
                                        <AssignmentOperator Compound='false' Image='=' />
                                        <Expression Image='' StandAlonePrimitive='false'>
                                            <AdditiveExpression Image='+' Operator='+'>
                                                <PrimaryExpression Image=''>
                                                    <PrimaryPrefix Image='' SuperModifier='false' ThisModifier='false'>
                                                        <Name Image='i' />
                                                    </PrimaryPrefix>
                                                </PrimaryExpression>
                                                <PrimaryExpression Image=''>
                                                    <PrimaryPrefix Image='' SuperModifier='false' ThisModifier='false'>
                                                        <Name Image='k' />
                                                    </PrimaryPrefix>
                                                </PrimaryExpression>
                                            </AdditiveExpression>
                                        </Expression>
                                    </StatementExpression>
                                </Statement>
                            </BlockStatement>
                        </Block>
                    </MethodDeclaration>
                </ClassOrInterfaceBodyDeclaration>
            </ClassOrInterfaceBody>
        </ClassOrInterfaceDeclaration>
    </TypeDeclaration>
</CompilationUnit>

Solution

  • In XPath 2 you don't have the handy deep-equal function available in version 3 which would allow you to compare two AdditiveExpression elements and all their descendants in a single function call. This lack means you have to write something much more verbose, but you could probably get away with something like this:

    for $e1 in //AdditiveExpression return
    $e1[
       some $e2 in (//AdditiveExpression except $e1) satisfies 
          $e1/@Image = $e2/@Image and 
          $e1/PrimaryExpression[1]/PrimaryPrefix/Name/@Image = 
             $e2/PrimaryExpression[1]/PrimaryPrefix/Name/@Image and
          $e1/PrimaryExpression[2]/PrimaryPrefix/Name/@Image = 
             $e2/PrimaryExpression[2]/PrimaryPrefix/Name/@Image
    ]
    

    This XPath 2 expression constructs a sequence of elements that are considered to have duplicates like so:

    For every expression ('$e1') in the set of AdditiiveExpression elements, return that $e1 if the criterion in the predicate ([ ... ]) evaluates to true, namely, that there's some other expression ('$e2') which is also one of the AdditiveExpression elements but isn't $e1, and which has the same Image attribute as the $e1 expression, and whose child elements have Image attributes that match the corresponding child elements of $e1.

    I'm not sure what are the important criteria for your two expressions to be considered duplicates; I didn't compare literally every attribute etc, but this should give you an idea.