Search code examples
xmlxsltdocx

DOCX table to xml using XSL


I have a docx file with a working-schedule table like so:

        monday | tuesday | wednesday | thursday | friday | saturday | sunday
Peter     5    |   4     |           |    6     |   5    |          |   11
John      2    |         |    1      |    6     |   5    |     4    |
etc..

I extracted the document.xml from the docx and am trying to create the following xml using this xml.

<schedule>
   <monday>
     <shift name="Peter" time="5" />
     <shift name="John"  time="2" />
   </monday>
   <tuesday>
 etc...

The only thing that I don't know how to do yet is add the shifts to the appropriate day. The xml I managed to get is:

<schedule>
   <monday>
     <shift name="Peter" time="5" />
   </monday>
   <monday>
     <shift name="John"  time="2" />
   </monday>
   <tuesday>
etc..

How do i fix this?

Attachements: the document.xml (extracted from the docx) HERE the xsl I created HERE


Solution

  • Apply to your current result this transformation (Muenchian grouping):

    <xsl:stylesheet version="1.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output omit-xml-declaration="yes" indent="yes"/>
     <xsl:strip-space elements="*"/>
     <xsl:key name="kDayByName" match="/*/*" use="name()"/>
    
     <xsl:template match="node()|@*">
         <xsl:copy>
           <xsl:apply-templates select="node()|@*"/>
         </xsl:copy>
     </xsl:template>
    
     <xsl:template match="/*/*"/>
    
     <xsl:template match=
      "/*/*[generate-id()
           =
            generate-id(key('kDayByName', name())[1])
           ]
      ">
    
      <xsl:copy>
        <xsl:apply-templates select=
            "key('kDayByName', name())/node()"/>
      </xsl:copy>
     </xsl:template>
    </xsl:stylesheet>
    

    when applied on this XML document:

    <schedule>
        <monday>
            <shift name="Peter" time="5" />
        </monday>
        <monday>
            <shift name="John"  time="2" />
        </monday>
    </schedule>
    

    the wanted, correct result is produced:

    <schedule>
        <monday>
            <shift name="Peter" time="5" />
            <shift name="John" time="2" />
        </monday>
    </schedule>