Search code examples
xmlxslt

Remove duplicated records from XML output using xslt


I need to transform source XML through XSLT and remove records that have duplicate ID with other records and its active status needs to be 0.

Source:

<Report_Data>
    <Report_Entry>
        <ID>12345<ID>
        <Date>2001-12-31</Date>
        <IS_ACTIVE>1</IS_ACTIVE>
    </Report_Entry>
    <Report_Entry>
        <ID>12345<ID>
        <Date>2002-12-31</Date>
        <IS_ACTIVE>0</IS_ACTIVE>
    </Report_Entry>
    <Report_Entry>
        <ID>98765<ID>
        <Date>2003-12-31</Date>
        <IS_ACTIVE>1</IS_ACTIVE>
    </Report_Entry>
    <Report_Entry>
        <ID>88888<ID>
        <Date>2004-12-31</Date>
        <IS_ACTIVE>0</IS_ACTIVE>
    </Report_Entry>
</Report_Data>

The resulting txt file should look like this:

ID|Date|IS_ACTIVE

12345|2001-12-31|1

98765|2003-12-31|1

88888|2004-12-31|0

Hence, only the record with repeated ID and IS_ACTIVE = 0 will be removed, the rest will be kept. Thanks in advance!

Was looking at for-each-group but I'm not sure how to apply in this case.


Solution

  • If I understand you requirement correctly (which not at all certain), you could do something like:

    XSLT 2.0

    <xsl:stylesheet version="2.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="text" encoding="UTF-8"/>
    
    <xsl:template match="/Report_Data">
        <xsl:text>ID|Date|IS_ACTIVE&#10;</xsl:text>
        <xsl:for-each-group  select="Report_Entry" group-by="ID">
            <xsl:choose>
                <xsl:when test="count(current-group()) > 1">
                    <xsl:apply-templates select="current-group()[not(IS_ACTIVE='0')]"/>
                </xsl:when>
                <xsl:otherwise>
                    <xsl:apply-templates select="current-group()"/>
                </xsl:otherwise>
            </xsl:choose>
       </xsl:for-each-group>
    </xsl:template>
    
    <xsl:template match="Report_Entry">
        <xsl:value-of select="ID, Date, IS_ACTIVE" separator="|"/>
        <xsl:text>&#10;</xsl:text>
    </xsl:template>
    
    </xsl:stylesheet>
    

    Note that if there are two or more entries with the same ID, and all of them are inactive, there will be no output for that group.