Search code examples
xmlxslt-2.0

Splitting text in element into multiple elements based on keywords/delimiters


my source xml

<event>
    <description>Special Notice - 20190305</description>
    <note>[Subject]: This is the subject of the event
    [Purpose]: This is the purpose of the event
    [Evaluation]: This is an evaluation of the event
    [Strategy]: This is the strategy for the event</note>
</event>
<event>
    <description>Notice</description>
    <note>This is a notice</note>
</event>

what the result should look like

<instance>
  <title>
   <text>Purpose</text>
  </title>
  <data>This is the purpose of the event</data>
</instance>
<instance>
  <title>
    <text>Subject</text>
  </title>
  <data>This is the subject of the event</data>
</instance>
<instance>
  <title>
   <text>Purpose</text>
  </title>
  <data>This is the purpose of the event</data>
</instance>
<instance>
  <title>
   <text>Notice</text>
  </title>
  <data>This is a notice</data>
</instance>
etc.

I'm pretty new to xslt and was stuck on something during an exercise - I have an idea of what I want to do but I'm having issues figuring out where to start. I'm looking to split the text from the note elements containing Subject, Purpose, Evaluation and Strategy into separate notes for each instance. There will be notes with other content, but my question here is specifically about these particular notes.

Each note element in source xml should have the portion of the sentence enclosed in square brackets serving as the title in destination xml; and whatever is after the colon goes under the data element, as well. My challenge has been figuring out how to parse what's under the note element and pass each line properly. I thought of using a for-each with some sort of regex to grab whatever is enclosed in each square bracket, but not sure if that's possible? Perhaps tokenize? Then I thought of using substring-before and substring-after to pass to the title and data elements, respectively.

EDIT: Just adding some more background, due to Daniel's suggestion to use analyze-string. As mentioned in bold above, there are some event elements that won't need to be split. I added an example of that to my source and destination xml. For these description and note should go to text and data, respectively.

As I mentioned in my reply to Daniel, my thinking is that value-of can just be taken in non-matching-substring from description and note to text and data.

EDIT 2: Here's an example of how I was thinking of doing this @DanielHaley. As said in my previous reply to you, I use for-each through my larger document (which I haven't posted the entirety of because it's quite long and it would be redundant to the question) to cycle through events and other elements that are under a common parent element.

<xsl:for-each select="event">
 <xsl:choose>
   <xsl:when test="contains(description,'Special')">
      <xsl:analyze-string select="note" regex="\[([^\]]+)\]:\s*([^\[]*)">
      <xsl:matching-substring>
       <title><text><xsl:value-of select="normalize-space(regex-group(1))"/></text></title>
       <data><xsl:value-of select="normalize-space(regex-group(2))"/></data>
      </xsl:matching-substring>
    </xsl:analyze-string>
  </xsl:when>
  <xsl:otherwise>
    <title><text><xsl:value-of select="description"></text></title>
    <data><xsl:value-of select="note"></data>
  </xsl:otherwise>
 </xsl:choose>
</xsl:for-each>


Solution

  • I would probably use xsl:analyze-string...

    XML Input (a little mangled for testing regex)

    <doc>
        <event>
            <description>Special Notice - 20190305</description>
            <note>[Subject]: This is the subject of the event
                [Purpose]: This is the purpose 
                of the event [Evaluation]: This is an evaluation of the event
                [Strategy]:
                This is the strategy for the event</note>
        </event>
        <event>
            <description>Notice</description>
            <note>This is a notice</note>
        </event>
    </doc>
    

    XSLT 2.0

    <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
      <xsl:output indent="yes"/>
      <xsl:strip-space elements="*"/>
    
      <xsl:template match="@*|node()">
        <xsl:copy>
          <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
      </xsl:template>
    
      <xsl:template match="event[matches(note,'\[[^\]]+\]')]">
        <xsl:analyze-string select="note" regex="\[([^\]]+)\]:\s*([^\[]*)">
          <xsl:matching-substring>
            <instance>
              <title>
                <text>
                  <xsl:value-of select="normalize-space(regex-group(1))"/>
                </text>
              </title>
              <data>
                <xsl:value-of select="normalize-space(regex-group(2))"/>
              </data>
            </instance>
          </xsl:matching-substring>
          <xsl:non-matching-substring>
            <!--This shouldn't trigger. If it does, you'll need to figure out
            how you want to handle the differences with the existing pattern.-->
            <xsl:message terminate="yes" 
              select="concat('Non-matching substring: ''',.,'''')"/>
          </xsl:non-matching-substring>
        </xsl:analyze-string>
      </xsl:template>
    
      <xsl:template match="event">
        <instance>
          <title>
            <text>
              <xsl:value-of select="description"/>
            </text>
          </title>
          <data>
            <xsl:value-of select="note"/>
          </data>
        </instance>
      </xsl:template>
    
    </xsl:stylesheet>
    

    XML Output

    <doc>
       <instance>
          <title>
             <text>Subject</text>
          </title>
          <data>This is the subject of the event</data>
       </instance>
       <instance>
          <title>
             <text>Purpose</text>
          </title>
          <data>This is the purpose of the event</data>
       </instance>
       <instance>
          <title>
             <text>Evaluation</text>
          </title>
          <data>This is an evaluation of the event</data>
       </instance>
       <instance>
          <title>
             <text>Strategy</text>
          </title>
          <data>This is the strategy for the event</data>
       </instance>
       <instance>
          <title>
             <text>Notice</text>
          </title>
          <data>This is a notice</data>
       </instance>
    </doc>
    

    Fiddle: http://xsltfiddle.liberty-development.net/94AbWAA