Search code examples
performanceoptimizationxsltxslt-2.0

XSLT - How to increase performance time


I have an XSLT code written to get the employee and manager hierarchy which is working as expected but its taking around 3 hours for 50,000 records. I believe its due to loop while creating variables,

Trying to optimize but could not find a good solution yet, Any help would be appreciated.

Thank you.

Current XSLT Code

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <!-- Identity template to copy the entire XML structure -->
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
    <!-- Template to generate the list of managers per employee -->
    <xsl:template match="employee">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
            <managers>
                <xsl:call-template name="findManagers">
                    <xsl:with-param name="employeeName" select="EmployeePositionID"/>
                </xsl:call-template>
            </managers>
        </xsl:copy>
    </xsl:template>
    <!-- Template to recursively find managers -->
    <xsl:template name="findManagers">
        <xsl:param name="employeeName"/>
        <xsl:param name="level" select="1"/>
        <xsl:if test="$employeeName != ''">
            <xsl:variable name="manager" select="//employee[EmployeePositionID = $employeeName]/Manager"/>
            <xsl:variable name="ManagerID_Var" select="//employee[EmployeePositionID = $employeeName]/Manager/@personid"/>
            <xsl:if test="$manager != ''">
                <hierarchy>
                    <xsl:attribute name="BottomToplevel">
                        <xsl:value-of select="$level"/>
                    </xsl:attribute>
                    <xsl:attribute name="ManagerID">
                        <xsl:value-of select="$ManagerID_Var"/>
                    </xsl:attribute>
                    <xsl:value-of select="$manager"/>
                </hierarchy>
                <!-- Recursively call the template for the next manager -->
                <xsl:call-template name="findManagers">
                    <xsl:with-param name="employeeName" select="$manager"/>
                    <xsl:with-param name="level" select="$level + 1"/>
                </xsl:call-template>
            </xsl:if>
        </xsl:if>
    </xsl:template>
</xsl:stylesheet>
  

Input XML

<?xml version="1.0" encoding="UTF-8"?>
<company>
    <employee>
        <effectiveStartDate>2020-08-20T00:00:00.000</effectiveStartDate>
        <EmployeePositionID>1</EmployeePositionID>
        <PersonID>1111</PersonID>
        <Name>Test1</Name>
        <ToBeHired>false</ToBeHired>
        <Manager personid="M1">2</Manager>
    </employee>
    <employee>
        <effectiveStartDate>2020-08-20T00:00:00.000</effectiveStartDate>
        <EmployeePositionID>2</EmployeePositionID>
        <PersonID>2222</PersonID>
        <Name>Test2</Name>
        <ToBeHired>false</ToBeHired>
        <Manager personid="M2">3</Manager>
    </employee>
    <employee>
        <effectiveStartDate>2020-08-20T00:00:00.000</effectiveStartDate>
        <EmployeePositionID>3</EmployeePositionID>
        <PersonID/>
        <Name/>
        <ToBeHired>false</ToBeHired>
        <Manager personid="">4</Manager>
    </employee>
    <employee>
        <effectiveStartDate>2020-08-20T00:00:00.000</effectiveStartDate>
        <EmployeePositionID>4</EmployeePositionID>
        <PersonID>3333</PersonID>
        <Name>Test3</Name>
        <ToBeHired>false</ToBeHired>
        <Manager personid="M3">5</Manager>
    </employee>
        <employee>
        <effectiveStartDate>2020-08-20T00:00:00.000</effectiveStartDate>
        <EmployeePositionID>5</EmployeePositionID>
        <PersonID>4444</PersonID>
        <Name>Test4</Name>
        <ToBeHired>false</ToBeHired>
        <Manager/>
    </employee>
</company>

Output XML should be

<?xml version="1.0" encoding="UTF-8"?>
<company>
    <employee>
        <effectiveStartDate>2020-08-20T00:00:00.000</effectiveStartDate>
        <EmployeePositionID>1</EmployeePositionID>
        <PersonID>1111</PersonID>
        <Name>Test1</Name>
        <ToBeHired>false</ToBeHired>
        <Manager personid="M1">2</Manager>
        <managers>
            <hierarchy BottomToplevel="1" ManagerID="M1">2</hierarchy>
            <hierarchy BottomToplevel="2" ManagerID="M2">3</hierarchy>
            <hierarchy BottomToplevel="3" ManagerID="">4</hierarchy>
            <hierarchy BottomToplevel="4" ManagerID="M3">5</hierarchy>
        </managers>
    </employee>
    <employee>
        <effectiveStartDate>2020-08-20T00:00:00.000</effectiveStartDate>
        <EmployeePositionID>2</EmployeePositionID>
        <PersonID>2222</PersonID>
        <Name>Test2</Name>
        <ToBeHired>false</ToBeHired>
        <Manager personid="M2">3</Manager>
        <managers>
            <hierarchy BottomToplevel="1" ManagerID="M2">3</hierarchy>
            <hierarchy BottomToplevel="2" ManagerID="">4</hierarchy>
            <hierarchy BottomToplevel="3" ManagerID="M3">5</hierarchy>
        </managers>
    </employee>
    <employee>
        <effectiveStartDate>2020-08-20T00:00:00.000</effectiveStartDate>
        <EmployeePositionID>3</EmployeePositionID>
        <PersonID/>
        <Name/>
        <ToBeHired>false</ToBeHired>
        <Manager personid="">4</Manager>
        <managers>
            <hierarchy BottomToplevel="1" ManagerID="">4</hierarchy>
            <hierarchy BottomToplevel="2" ManagerID="M3">5</hierarchy>
        </managers>
    </employee>
    <employee>
        <effectiveStartDate>2020-08-20T00:00:00.000</effectiveStartDate>
        <EmployeePositionID>4</EmployeePositionID>
        <PersonID>3333</PersonID>
        <Name>Test3</Name>
        <ToBeHired>false</ToBeHired>
        <Manager personid="M3">5</Manager>
        <managers>
            <hierarchy BottomToplevel="1" ManagerID="M3">5</hierarchy>
        </managers>
    </employee>
    <employee>
        <effectiveStartDate>2020-08-20T00:00:00.000</effectiveStartDate>
        <EmployeePositionID>5</EmployeePositionID>
        <PersonID>4444</PersonID>
        <Name>Test4</Name>
        <ToBeHired>false</ToBeHired>
        <Manager/>
        <managers/>
    </employee>
</company>

Solution

  • The poor performance will be because of the instructions

    <xsl:variable name="manager" 
      select="//employee[EmployeePositionID = $employeeName]/Manager"/>
    <xsl:variable name="ManagerID_Var" 
      select="//employee[EmployeePositionID = $employeeName]/Manager/@personid"/>
    

    which mean that for every employee, you are searching the whole file to find the manager of that employee, making the performance quadratic - O(n^2) - in the number of employees.

    A few XSLT processors such as my company's Saxon-EE might optimize such expressions to use an index, but most are going to do a serial search through the document.

    However, you can do the indexing "by hand" and this will greatly improve performance. Define an index like this:

    <xsl:key name="empIndex" match="employee" use="EmployeePositionID"/>
    

    and replace //employee[EmployeePositionID = $employeeName] with key('empIndex', $employeeName).

    You should also be able to achieve a saving by only doing the search once rather than twice: the second variable can be defined in terms of the first as

    <xsl:variable name="ManagerID_Var" 
          select="$manager/@personid"/>