I have an XSLT code written to get the employee and manager hierarchy which is working as expected but its taking around 3 hours for 50,000 records. I believe its due to loop while creating variables,
Trying to optimize but could not find a good solution yet, Any help would be appreciated.
Thank you.
Current XSLT Code
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<!-- Identity template to copy the entire XML structure -->
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<!-- Template to generate the list of managers per employee -->
<xsl:template match="employee">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
<managers>
<xsl:call-template name="findManagers">
<xsl:with-param name="employeeName" select="EmployeePositionID"/>
</xsl:call-template>
</managers>
</xsl:copy>
</xsl:template>
<!-- Template to recursively find managers -->
<xsl:template name="findManagers">
<xsl:param name="employeeName"/>
<xsl:param name="level" select="1"/>
<xsl:if test="$employeeName != ''">
<xsl:variable name="manager" select="//employee[EmployeePositionID = $employeeName]/Manager"/>
<xsl:variable name="ManagerID_Var" select="//employee[EmployeePositionID = $employeeName]/Manager/@personid"/>
<xsl:if test="$manager != ''">
<hierarchy>
<xsl:attribute name="BottomToplevel">
<xsl:value-of select="$level"/>
</xsl:attribute>
<xsl:attribute name="ManagerID">
<xsl:value-of select="$ManagerID_Var"/>
</xsl:attribute>
<xsl:value-of select="$manager"/>
</hierarchy>
<!-- Recursively call the template for the next manager -->
<xsl:call-template name="findManagers">
<xsl:with-param name="employeeName" select="$manager"/>
<xsl:with-param name="level" select="$level + 1"/>
</xsl:call-template>
</xsl:if>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
Input XML
<?xml version="1.0" encoding="UTF-8"?>
<company>
<employee>
<effectiveStartDate>2020-08-20T00:00:00.000</effectiveStartDate>
<EmployeePositionID>1</EmployeePositionID>
<PersonID>1111</PersonID>
<Name>Test1</Name>
<ToBeHired>false</ToBeHired>
<Manager personid="M1">2</Manager>
</employee>
<employee>
<effectiveStartDate>2020-08-20T00:00:00.000</effectiveStartDate>
<EmployeePositionID>2</EmployeePositionID>
<PersonID>2222</PersonID>
<Name>Test2</Name>
<ToBeHired>false</ToBeHired>
<Manager personid="M2">3</Manager>
</employee>
<employee>
<effectiveStartDate>2020-08-20T00:00:00.000</effectiveStartDate>
<EmployeePositionID>3</EmployeePositionID>
<PersonID/>
<Name/>
<ToBeHired>false</ToBeHired>
<Manager personid="">4</Manager>
</employee>
<employee>
<effectiveStartDate>2020-08-20T00:00:00.000</effectiveStartDate>
<EmployeePositionID>4</EmployeePositionID>
<PersonID>3333</PersonID>
<Name>Test3</Name>
<ToBeHired>false</ToBeHired>
<Manager personid="M3">5</Manager>
</employee>
<employee>
<effectiveStartDate>2020-08-20T00:00:00.000</effectiveStartDate>
<EmployeePositionID>5</EmployeePositionID>
<PersonID>4444</PersonID>
<Name>Test4</Name>
<ToBeHired>false</ToBeHired>
<Manager/>
</employee>
</company>
Output XML should be
<?xml version="1.0" encoding="UTF-8"?>
<company>
<employee>
<effectiveStartDate>2020-08-20T00:00:00.000</effectiveStartDate>
<EmployeePositionID>1</EmployeePositionID>
<PersonID>1111</PersonID>
<Name>Test1</Name>
<ToBeHired>false</ToBeHired>
<Manager personid="M1">2</Manager>
<managers>
<hierarchy BottomToplevel="1" ManagerID="M1">2</hierarchy>
<hierarchy BottomToplevel="2" ManagerID="M2">3</hierarchy>
<hierarchy BottomToplevel="3" ManagerID="">4</hierarchy>
<hierarchy BottomToplevel="4" ManagerID="M3">5</hierarchy>
</managers>
</employee>
<employee>
<effectiveStartDate>2020-08-20T00:00:00.000</effectiveStartDate>
<EmployeePositionID>2</EmployeePositionID>
<PersonID>2222</PersonID>
<Name>Test2</Name>
<ToBeHired>false</ToBeHired>
<Manager personid="M2">3</Manager>
<managers>
<hierarchy BottomToplevel="1" ManagerID="M2">3</hierarchy>
<hierarchy BottomToplevel="2" ManagerID="">4</hierarchy>
<hierarchy BottomToplevel="3" ManagerID="M3">5</hierarchy>
</managers>
</employee>
<employee>
<effectiveStartDate>2020-08-20T00:00:00.000</effectiveStartDate>
<EmployeePositionID>3</EmployeePositionID>
<PersonID/>
<Name/>
<ToBeHired>false</ToBeHired>
<Manager personid="">4</Manager>
<managers>
<hierarchy BottomToplevel="1" ManagerID="">4</hierarchy>
<hierarchy BottomToplevel="2" ManagerID="M3">5</hierarchy>
</managers>
</employee>
<employee>
<effectiveStartDate>2020-08-20T00:00:00.000</effectiveStartDate>
<EmployeePositionID>4</EmployeePositionID>
<PersonID>3333</PersonID>
<Name>Test3</Name>
<ToBeHired>false</ToBeHired>
<Manager personid="M3">5</Manager>
<managers>
<hierarchy BottomToplevel="1" ManagerID="M3">5</hierarchy>
</managers>
</employee>
<employee>
<effectiveStartDate>2020-08-20T00:00:00.000</effectiveStartDate>
<EmployeePositionID>5</EmployeePositionID>
<PersonID>4444</PersonID>
<Name>Test4</Name>
<ToBeHired>false</ToBeHired>
<Manager/>
<managers/>
</employee>
</company>
The poor performance will be because of the instructions
<xsl:variable name="manager"
select="//employee[EmployeePositionID = $employeeName]/Manager"/>
<xsl:variable name="ManagerID_Var"
select="//employee[EmployeePositionID = $employeeName]/Manager/@personid"/>
which mean that for every employee, you are searching the whole file to find the manager of that employee, making the performance quadratic - O(n^2) - in the number of employees.
A few XSLT processors such as my company's Saxon-EE might optimize such expressions to use an index, but most are going to do a serial search through the document.
However, you can do the indexing "by hand" and this will greatly improve performance. Define an index like this:
<xsl:key name="empIndex" match="employee" use="EmployeePositionID"/>
and replace //employee[EmployeePositionID = $employeeName]
with key('empIndex', $employeeName)
.
You should also be able to achieve a saving by only doing the search once rather than twice: the second variable can be defined in terms of the first as
<xsl:variable name="ManagerID_Var"
select="$manager/@personid"/>