Search code examples
csvxsltxslt-1.0

get sum of a column based on distinct value of another column using xslt 1.0


<RawData>
    <row>
        <column>CENTERS 1</column>
        <column>41</column>
        <column>1</column>
        <column>10</column>
    </row>
    <row>
        <column>CENTERS 2</column>
        <column>42</column>
        <column>2</column>
        <column>10</column>
    </row>
    <row>
        <column>CENTERS 3</column>
        <column>43</column>
        <column>1</column>
        <column>10</column>
    </row>
    <row>
        <column>CENTERS 4</column>
        <column>44</column>
        <column>3</column>
        <column>10</column>
    </row>
    <row>
        <column>CENTERS 5</column>
        <column>45</column>
        <column>1</column>
        <column>10</column>
    </row>
    <row>
        <column>CENTERS 6</column>
        <column>46</column>
        <column>4</column>
        <column>10</column>
    </row>
</RawData>

This is my xml data. I want to sum up the 4th column based on distinct 3rd column. To make it more clearer only the yellow parts need to be added.

sample data in table form

Current Output:

Final Count: 6, Sum: 60

Expected Output:

Final Count: 6, Sum: 40

The stylesheet I have now adds up all values. I tried some ways but not working.

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="text" />

  <xsl:template match="/RawData">
    <xsl:call-template name="processRows">
      <xsl:with-param name="count" select="0" />
      <xsl:with-param name="sum" select="0" />
      <xsl:with-param name="rows" select="row" />
    </xsl:call-template>
  </xsl:template>

  <xsl:template name="processRows">
    <xsl:param name="count" />
    <xsl:param name="sum" />
    <xsl:param name="rows" />

    <xsl:if test="$rows">
      <xsl:variable name="currentVal" select="$val + 1" />
      <xsl:variable name="currentCount" select="$count + 1" />
      <xsl:variable name="currentSum" select="$sum + number($rows[1]/column[4])" />

      <xsl:call-template name="processRows">
        <xsl:with-param name="count" select="$currentCount" />
        <xsl:with-param name="sum" select="$currentSum" />
        <xsl:with-param name="rows" select="$rows[position() > 1]" />
      </xsl:call-template>
    </xsl:if>

    <xsl:if test="not($rows)">
      Final
      <xsl:text>Count: </xsl:text>
      <xsl:value-of select="$count" />
      <xsl:text>, Sum: </xsl:text>
      <xsl:value-of select="$sum" />
    </xsl:if>
  </xsl:template>
</xsl:stylesheet>`

Solution

  • I want to sum up the 4th column based on distinct 3rd column.

    If you mean you want to identify the rows that would be first in their group if grouped by the 3rd column value and sum up the values in their 4th column, then this could be done very simply by adapting the first part of the Muenchian grouping method:

    XSLT 1.0

    <xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
    
    <xsl:key name="k" match="row" use="column[3]" />
    
    <xsl:template match="/RawData">
        <sum>
            <xsl:value-of select="sum(row[count(. | key('k', column[3])[1]) = 1]/column[4])"/>
        </sum>
    </xsl:template>
    
    </xsl:stylesheet>
    

    The result here will be:

    <?xml version="1.0" encoding="UTF-8"?>
    <sum>40</sum>
    

    and the rows being summed are:

      <row>
            <column>CENTERS 1</column>
            <column>41</column>
            <column>1</column>
            <column>10</column>
        </row>
      <row>
            <column>CENTERS 2</column>
            <column>42</column>
            <column>2</column>
            <column>10</column>
        </row>
      <row>
            <column>CENTERS 4</column>
            <column>44</column>
            <column>3</column>
            <column>10</column>
        </row>
      <row>
            <column>CENTERS 6</column>
            <column>46</column>
            <column>4</column>
            <column>10</column>
        </row>
    

    No clue what you meant by "Final Count" or "Val".