Search code examples
xmlxsltxslt-1.0

How to flatten and reduce XML using XSLT and best practice?


Apologies if this is a stupid question - I'm brand new to XSL as of today and I'm still trying to get used to the core concepts.

I have the following XML file with a hierarchical structure:

<?xml version="1.0" encoding="UTF-8" ?>
<EXCHANGE>
 <SCE>
  <SCE.SRS>
   <SCE_SCJC.SCE.SRS>1234567/1</SCE_SCJC.SCE.SRS>
   <SCE_SEQ2.SCE.SRS>01</SCE_SEQ2.SCE.SRS>
   <SCE_STUC.SCE.SRS>1234567</SCE_STUC.SCE.SRS>
   <SCE_AYRC.SCE.SRS>2022/23</SCE_AYRC.SCE.SRS>
   <SCE_CRSC.SCE.SRS>CRS123456</SCE_CRSC.SCE.SRS>
   <SCE_BLOK.SCE.SRS>01</SCE_BLOK.SCE.SRS>
   <SCE_DPTC.SCE.SRS>ABCD</SCE_DPTC.SCE.SRS>
   <STU>
    <STU.SRS>
     <STU_CODE.STU.SRS>22100814</STU_CODE.STU.SRS>
     <STU_NAME.STU.SRS>Test Student Name</STU_NAME.STU.SRS>
     <STU_DOB.STU.SRS>01011993</STU_DOB.STU.SRS>
     <STU_INEM.STU.SRS/>
    </STU.SRS>
   </STU>
   <DPT>
    <DPT.SRS>
     <DPT_CODE.DPT.SRS>ABCD</DPT_CODE.DPT.SRS>
     <DPT_NAME.DPT.SRS>Department Name</DPT_NAME.DPT.SRS>
    </DPT.SRS>
   </DPT>
   <CRS>
    <CRS.SRS>
     <CRS_CODE.CRS.SRS>CRS123456</CRS_CODE.CRS.SRS>
     <CRS_NAME.CRS.SRS>Course Name</CRS_NAME.CRS.SRS>
     <CRS_MOAC.CRS.SRS>FT</CRS_MOAC.CRS.SRS>
    </CRS.SRS>
   </CRS>
  </SCE.SRS>
 </SCE>
</EXCHANGE>

I need to do the following:

  1. Flatten the structure
  2. Take only specific elements
  3. Rename the chosen elements

This is so that I can pass the resultant output into an API.

The output needs to look something like this:

<StudentRecord>
     <STUDENT_NO>1234567</STUDENT_NO>
     <STUDENT_NAME>Test Student Name</STUDENT_NAME>
     <DATE_OF_BIRTH>01011993</DATE_OF_BIRTH>
     <EMAIL_ADDRESS/>
     <FACULTY>ABCD</FACULTY>
     <DEGREE_NAME>Course Name</DEGREE_NAME>
</StudentRecord>

I've created the following XSL file (it has to be using XSLT 1.0):

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="/EXCHANGE/SCE/SCE.SRS">
        <xsl:element name="STUDENT_NO">
            <xsl:value-of select="SCE_SCJC.SCE.SRS"/>
        </xsl:element>
        <xsl:text>&#10;</xsl:text>
        <xsl:element name="STUDENT_NAME">
            <xsl:value-of select="STU/STU.SRS/STU_NAME.STU.SRS"/>
        </xsl:element>
        <xsl:text>&#10;</xsl:text>
        <xsl:element name="DATE_OF_BIRTH">
            <xsl:value-of select="STU/STU.SRS/STU_DOB.STU.SRS"/>
        </xsl:element>
        <xsl:text>&#10;</xsl:text>
        <xsl:element name="EMAIL_ADDRESS">
            <xsl:value-of select="STU/STU.SRS/STU_INEM.STU.SRS"/>
        </xsl:element>
        <xsl:text>&#10;</xsl:text>
        <xsl:element name="FACULTY">
            <xsl:value-of select="DPT/DPT.SRS/DPT_NAME.DPT.SRS"/>
        </xsl:element>
        <xsl:text>&#10;</xsl:text>
        <xsl:element name="DEGREE_NAME">
            <xsl:value-of select="CRS/CRS.SRS/CRS_NAME.CRS.SRS"/>
        </xsl:element>
    </xsl:template>
</xsl:stylesheet>

This outputs something very close to what I need, as I get the following output when I apply the XSL transformation:

<?xml version="1.0"?>
 
  <STUDENT_NO>1234567/1</STUDENT_NO>
<STUDENT_NAME>Test Student Name</STUDENT_NAME>
<DATE_OF_BIRTH>01011993</DATE_OF_BIRTH>
<EMAIL_ADDRESS></EMAIL_ADDRESS>
<FACULTY>ABCD</FACULTY>
<DEGREE_NAME>Course Name</DEGREE_NAME>
 

Technically I could use this and pass it across to the API as is. I feel however like this is not the best way to get what I'm after, and that there is a better way to pick certain elements from each node and change the element names.

Would what I've got be considered correct useage? Is there a better way to get the specific elements I need into the required format?


Solution

  • There are many ways this could be done. Here's how I would do it. Having the different sections separated in individual templates would make it easy to add or remove elements later on if required.

    <?xml version="1.0" encoding="UTF-8"?>
    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
        version="1.0">
    
      <xsl:output method="xml" indent="yes"/>
    
      <xsl:template match="/">
        <!-- Create new root element -->
        <StudentRecord>
          <xsl:apply-templates select="EXCHANGE/SCE/SCE.SRS"/>
        </StudentRecord>
      </xsl:template>
      
      <!-- General information -->
      <xsl:template match="SCE.SRS">
        <STUDENT_NO>
          <xsl:value-of select="SCE_STUC.SCE.SRS"/>
        </STUDENT_NO>
        <!-- Student information -->
        <xsl:apply-templates select="STU/STU.SRS"/>
        <!-- Department information -->
        <xsl:apply-templates select="DPT/DPT.SRS"/>
        <!-- Course information -->
        <xsl:apply-templates select="CRS/CRS.SRS"/>
      </xsl:template>
      
      <!-- Student information -->
      <xsl:template match="STU.SRS">
        <STUDENT_NAME>
          <xsl:value-of select="STU_NAME.STU.SRS"/>
        </STUDENT_NAME>
        <DATE_OF_BIRTH>
          <xsl:value-of select="STU_DOB.STU.SRS"/>
        </DATE_OF_BIRTH>
        <EMAIL_ADDRESS>
          <xsl:value-of select="STU_INEM.STU.SRS"/>
        </EMAIL_ADDRESS>
      </xsl:template>
      
      <!-- Department information -->
      <xsl:template match="DPT.SRS">
        <FACULTY>
          <xsl:value-of select="DPT_NAME.DPT.SRS"/>
        </FACULTY>
      </xsl:template>
      
      <!-- Course information -->
      <xsl:template match="CRS.SRS">
        <DEGREE_NAME>
          <xsl:value-of select="CRS_NAME.CRS.SRS"/>  
        </DEGREE_NAME>
      </xsl:template>
      
    </xsl:stylesheet>
    

    See it working here : https://xsltfiddle.liberty-development.net/jyyho8J