Search code examples
xsltmuenchian-grouping

What is wrong with this merge and dedupe approach?


Given this source document:

<?xml version="1.0" encoding="utf-8"?>
<config>
  <group name="global">
    <globals>
      <item grp="db" prop="userid" value="foo"/>
      <item grp="db" prop="passwd" value="bar"/>
      <item grp="log" prop="level" value="debug"/>
      <item grp="log" prop="filename" value="red.log"/>
    </globals>
  </group>
  <group name="dev">
    <globals>
      <item grp="db" prop="server" value="dev_sql_1"/>
    </globals>
    <locals>
      <item grp="db" prop="catalog" value="red_db_local"/>
      <item grp="db" prop="passwd" value="dev_passwd"/>
      <item grp="log" prop="level" value="info"/>
    </locals>
  </group>
  <group name="qa">
    <globals>
      <item grp="db" prop="server" value="qa_sql_1"/>
      <item grp="db" prop="catalog" value="qa_db"/>  <!-- this is wonky, but may happen -->
    </globals>
    <locals>
      <item grp="db" prop="catalog" value="red_db_local"/> <!-- this should beat 'qa_db' from ../globals/item[@grp='db' and prop='catalog'] -->
      <item grp="db" prop="passwd" value="qa_passwd"/>
      <item grp="log" prop="level" value="critical"/>
    </locals>
  </group>
  <group name="prod">
    <globals>
      <item grp="db" prop="server" value="prod_sql_1"/>
    </globals>
    <locals>
      <item grp="db" prop="catalog" value="prod_db_local"/>
      <item grp="db" prop="passwd" value="prod_passwd"/>
      <item grp="log" prop="level" value="critical"/>
    </locals>
  </group>
</config>

and a parameter that is one of the available environments, I'd like to end up with a merged and deduped node-set, keeping the most specific values. So, for 'prod':

<config>
  <item grp="db" prop="userid" value="foo"/>
  <item grp="log" prop="filename" value="red.log"/>
  <item grp="db" prop="server" value="prod_sql_1"/>
  <item grp="db" prop="catalog" value="prod_db_local"/>
  <item grp="db" prop="passwd" value="prod_passwd"/>
  <item grp="log" prop="level" value="critical"/>
</config>

I'm very new to using keys in XSLT 1.0, and I've come up with this stylesheet that works for 'prod', but not for 'dev' or 'qa':

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet
    version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/>
  <xsl:param name="environment"/>

  <!-- 
    using | to create a union of top-level global items and and env-specific items
  -->
  <xsl:variable name="all-items"
                select="/config/group[@name='global']/globals/item |
                        //group[@name=$environment]//item"/>

  <xsl:key name="dupes" match="item" use="concat(@grp,'|',@prop)"/>

  <xsl:template match="/config">
    <xsl:copy>
      <xsl:copy-of
          select="$all-items[generate-id() = generate-id(key('dupes',
                    concat(@grp,'|',@prop))[last()])]"/>
    </xsl:copy>
  </xsl:template>
</xsl:stylesheet>

This is the approach I'm aiming for:

  1. merge all relevant <item.../> nodes into node-set with union |
  2. group this node-set by the @grp and @prop attributes
  3. keep the last node in any of the resulting groups (de-dupe)

Since I'm new to keys, I can only say that I think this bit of code,

<xsl:copy-of select="$all-items[generate-id() = generate-id(key('dupes',
                                            concat(@grp,'|',@prop))[last()])]"/>

is selecting the last() node out of a node-set of duplicate items, but when run with 'dev' or 'qa', I get the following:

REG zacharyyoung$ xsltproc --stringparam environment dev config3.xsl config3.xml 
<config>
  <item grp="db" prop="userid" value="foo"/>
  <item grp="log" prop="filename" value="red.log"/>
</config>
REG zacharyyoung$ xsltproc --stringparam environment qa config3.xsl config3.xml
<config>
  <item grp="db" prop="userid" value="foo"/>
  <item grp="log" prop="filename" value="red.log"/>
</config>

I've checked the intermediate variable all-items for each environment parameter, and it appears that at least that much is working correctly.

If I move <group name="qa"/> to the bottom, like:

<config>
  <group name="global">...</group>
  <group name="dev">...</group>
  <group name="prod">...</group>
  <group name="qa">...</group>
<config>

then running it with 'qa' works:

REG zacharyyoung$ xsltproc --stringparam environment qa config3.xsl config3.xml
<config>
  <item grp="db" prop="userid" value="foo"/>
  <item grp="log" prop="filename" value="red.log"/>
  <item grp="db" prop="server" value="qa_sql_1"/>
  <item grp="db" prop="catalog" value="red_db_local"/>
  <item grp="db" prop="passwd" value="qa_passwd"/>
  <item grp="log" prop="level" value="critical"/>
</config>

So, why does the position of the <group name="...">...</group> I'm selecting matter? Specifically, why is it only working in the last position, and how do I make it work for any position?

EDIT 1

When I isolate the data from $all-items (for any environment) and put it in it's own file, the XSL works correctly. The following example is the union of the globals and 'dev':

<config>
  <item grp="db" prop="userid" value="foo"/>
  <item grp="db" prop="passwd" value="bar"/>
  <item grp="log" prop="level" value="debug"/>
  <item grp="log" prop="filename" value="red.log"/>
  <item grp="db" prop="server" value="dev_sql_1"/>
  <item grp="db" prop="catalog" value="red_db_local"/>
  <item grp="db" prop="passwd" value="dev_passwd"/>
  <item grp="log" prop="level" value="info"/>
</config>

and this XSL:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/>
  <xsl:key name="dupes" match="item" use="concat(@grp,'|',@prop)"/>
  <xsl:template match="/config">
    <xsl:copy>
          <xsl:copy-of
              select="item[generate-id() = generate-id(key('dupes',
                      concat(@grp,'|',@prop))[last()])]"/>
    </xsl:copy>
  </xsl:template>
</xsl:stylesheet>

results in:

REG zacharyyoung$ xsltproc config4.xsl config4.xml
<config>
  <item grp="db" prop="userid" value="foo"/>
  <item grp="log" prop="filename" value="red.log"/>
  <item grp="db" prop="server" value="dev_sql_1"/>
  <item grp="db" prop="catalog" value="red_db_local"/>
  <item grp="db" prop="passwd" value="dev_passwd"/>
  <item grp="log" prop="level" value="info"/>
</config>

So, now it appears to be down to the variable all-items?

Thank you.


Solution

  • I'm not sure why the grouping isn't working (I will try to look at it soon), but you can also achieve your wanted output without using keys at all.

    This XSLT 1.0 stylesheet:

    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
      <xsl:output indent="yes"/>
      <xsl:strip-space elements="*"/>
    
      <xsl:param name="environment" select="'qa'"/>
    
      <xsl:template match="node()|@*">
        <xsl:copy>
          <xsl:apply-templates select="node()|@*"/>
        </xsl:copy>
      </xsl:template>
    
      <xsl:template match="group">
        <xsl:if test="@name = $environment">
          <xsl:apply-templates select="/config/group[@name='global']/globals/item[not(@prop = /config/group[@name='prod']/locals/item/@prop)]"/>
          <xsl:apply-templates/>      
        </xsl:if>
      </xsl:template>
    
      <xsl:template match="globals|locals">
        <xsl:apply-templates/>
      </xsl:template>
    
    </xsl:stylesheet>
    

    applied to your input XML produces the wanted output:

    <config>
       <item grp="db" prop="userid" value="foo"/>
       <item grp="log" prop="filename" value="red.log"/>
       <item grp="db" prop="server" value="qa_sql_1"/>
       <item grp="db" prop="catalog" value="red_db_local"/>
       <item grp="db" prop="passwd" value="qa_passwd"/>
       <item grp="log" prop="level" value="critical"/>
    </config>
    

    This also works for "prod" and "dev".

    Edit: Removed variable from predicate.