Search code examples
xsltxslt-2.0xslt-3.0

How to group-adjacent “p” entry according to attribute “content-type=”Sta_index2“” - XSLT


I am try to group-adjacent element of <p content-type="Sta_index2">. If it's comes in element of <p content-type="Sta_index1"> then S/B change entry only <p content-type="Sta_index2"> and if the e.g. (2860(c)&#x2013;(f), 2860(c)) contains – substring-before the same number then S/B e.g. (2860(c), 2860(c)&#x2013;(f)) and e.g. ('337', 337.15, 337(c)) then S/B e.g. ('337', 337(c), 337.15).
Input XML

<root>
<sec>
    <title>Title 1</title>
    <p content-type="Sta_index1"><bold>Title 15</bold></p>
    <p content-type="Sta_index2"><bold>10(b)</bold>: <named-content content-type="emSecs"><named-content content-type="ceb_1625198599975jm">13.20</named-content></named-content></p>
    <p content-type="Sta_index2"><bold>10(a)</bold>: <named-content content-type="emSecs"><named-content content-type="ceb_785262054035ad">11.37</named-content>, <named-content content-type="ceb_027863847761il">13.4</named-content>, <named-content content-type="ceb_784300142022op">13.21</named-content>&#x2013;<named-content content-type="ceb_641775392148fq">13.26</named-content>, <named-content content-type="ceb_758553200629ve">18.19</named-content></named-content></p>
    <p content-type="Sta_index2"><bold>2860(c)&#x2013;(f)</bold>: <named-content content-type="emSecs"><named-content content-type="ceb_1625198599975jm">13.20</named-content></named-content></p>
    <p content-type="Sta_index2"><bold>2860(c)</bold>: <named-content content-type="emSecs"><named-content content-type="ceb_785262054035ad">11.37</named-content>, <named-content content-type="ceb_027863847761il">13.4</named-content>, <named-content content-type="ceb_784300142022op">13.21</named-content>&#x2013;<named-content content-type="ceb_641775392148fq">13.26</named-content>, <named-content content-type="ceb_758553200629ve">18.19</named-content></named-content></p>
    <p content-type="Sta_index2"><bold>17200&#x2013;17210</bold>: <named-content content-type="emSecs"><named-content content-type="ceb_256816646439br">1.47</named-content>, <named-content content-type="ceb_048717893609pd">2.19</named-content>, <named-content content-type="ceb_86117613396fo">3.69</named-content>, <named-content content-type="ceb_315271864877kv">9.4</named-content>, <named-content content-type="ceb_3571295329014io">24.36</named-content>, <named-content content-type="ceb_622152169547qs">28.29</named-content></named-content></p>
    <p content-type="Sta_index2"><bold>17200</bold>: <named-content content-type="emSecs"><named-content content-type="ceb_256816646439br">1.47</named-content></named-content></p>
    <p content-type="Sta_index1"><bold>Title 18</bold></p>
    <p content-type="Sta_index2"><bold>337</bold>: <named-content content-type="emSecs"><named-content content-type="ceb_299721418847ei">24.15</named-content>, <named-content content-type="ceb_1071282833945ij">27.12</named-content></named-content></p>
    <p content-type="Sta_index2"><bold>337.15</bold>: <named-content content-type="emSecs"><named-content content-type="ceb_044978926181bc">1.20</named-content></named-content></p>
    <p content-type="Sta_index2"><bold>337(c)</bold>: <named-content content-type="emSecs"><named-content content-type="ceb_485647382794if">21.25</named-content></named-content></p>
    <p content-type="Sta_index2"><bold>790.03(h)(1)&#x2013;(4)</bold>: <named-content content-type="emSecs"><named-content content-type="ceb_7581828949726mt">8.23</named-content></named-content></p>
    <p content-type="Sta_index2"><bold>790.03(h)(1)&#x2013;(13)</bold>: <named-content content-type="emSecs"><named-content content-type="ceb_7581828949726mt">8.23</named-content></named-content></p>
    <p content-type="Sta_index2"><bold>790.03(h)(1)</bold>: <named-content content-type="emSecs"><named-content content-type="ceb_947708106972jn">9.10</named-content></named-content></p>
    <p content-type="Sta_index2"><bold>790.03(h)(2)</bold>: <named-content content-type="emSecs"><named-content content-type="ceb_947708106972jn">9.10</named-content>, <named-content content-type="ceb_014483150222fa">10.5</named-content></named-content></p>
    <p content-type="Sta_index1"><bold>Title 20</bold></p>
    <p content-type="Sta_index2"><bold>1.4.1</bold>: <named-content content-type="emSecs"><named-content content-type="ceb_381003477796sg">13.34</named-content>, <named-content content-type="ceb_5704471494217hk">14.17</named-content></named-content></p>
    <p content-type="Sta_index2"><bold>1.1</bold>: <named-content content-type="emSecs"><named-content content-type="ceb_381003477796sg">13.34</named-content></named-content></p>
    <p content-type="Sta_index2"><bold>1.1(c)</bold>: <named-content content-type="emSecs"><named-content content-type="ceb_381003477796sg">13.34</named-content></named-content></p>
    <p content-type="Sta_index2"><bold>1.16(b)(4)</bold>: <named-content content-type="emSecs"><named-content content-type="ceb_472031754764fn">12.36</named-content></named-content></p>
    <p content-type="Sta_index2"><bold>1.4</bold>: <named-content content-type="emSecs"><named-content content-type="ceb_381003477796sg">13.34</named-content>, <named-content content-type="ceb_5704471494217hk">14.17</named-content></named-content></p>
    <sec>
        <title>Title 1(a)</title>
        <p content-type="Sta_index2"><bold>4&#x2013;5</bold>: <named-content content-type="emSecs"><named-content content-type="ceb_0504619109866mg">22.15</named-content></named-content></p>
        <p content-type="Sta_index2"><bold>4</bold>: <named-content content-type="emSecs"><named-content content-type="ceb_0504619109866mg">22.15</named-content></named-content></p>
        <p content-type="Sta_index2"><bold>4.1</bold>: <named-content content-type="emSecs"><named-content content-type="ceb_0504619109866mg">22.15</named-content></named-content></p>
        <p content-type="Sta_index2"><bold>4(a)</bold>: <named-content content-type="emSecs"><named-content content-type="ceb_0504619109866mg">22.15</named-content></named-content></p>
    </sec>
</sec>
**XSLT Code**
    <xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="sec">
    <xsl:copy>
        <xsl:for-each-group select="*" group-adjacent=". instance of element(p)">
            <xsl:choose>
                <xsl:when test="current-grouping-key()">
                    <xsl:apply-templates select="sort(current-group(), 'http://www.w3.org/2013/collation/UCA?numeric=yes')"/>
                </xsl:when>
                <xsl:otherwise>
                    <xsl:apply-templates select="current-group()"/>
                </xsl:otherwise>
            </xsl:choose>
        </xsl:for-each-group>
    </xsl:copy>
</xsl:template>

Expected Output

<root>
<sec>
    <title>Title 1</title>
    <p content-type="Sta_index1"><bold>Title 15</bold></p>
    <p content-type="Sta_index2"><bold>10(a)</bold>: <named-content content-type="emSecs"><named-content content-type="ceb_785262054035ad">11.37</named-content>, <named-content content-type="ceb_027863847761il">13.4</named-content>, <named-content content-type="ceb_784300142022op">13.21</named-content>&#x2013;<named-content content-type="ceb_641775392148fq">13.26</named-content>, <named-content content-type="ceb_758553200629ve">18.19</named-content></named-content></p>
    <p content-type="Sta_index2"><bold>10(b)</bold>: <named-content content-type="emSecs"><named-content content-type="ceb_1625198599975jm">13.20</named-content></named-content></p>
    <p content-type="Sta_index2"><bold>2860(c)</bold>: <named-content content-type="emSecs"><named-content content-type="ceb_785262054035ad">11.37</named-content>, <named-content content-type="ceb_027863847761il">13.4</named-content>, <named-content content-type="ceb_784300142022op">13.21</named-content>&#x2013;<named-content content-type="ceb_641775392148fq">13.26</named-content>, <named-content content-type="ceb_758553200629ve">18.19</named-content></named-content></p>
    <p content-type="Sta_index2"><bold>2860(c)&#x2013;(f)</bold>: <named-content content-type="emSecs"><named-content content-type="ceb_1625198599975jm">13.20</named-content></named-content></p>
    <p content-type="Sta_index2"><bold>17200</bold>: <named-content content-type="emSecs"><named-content content-type="ceb_256816646439br">1.47</named-content></named-content></p>
    <p content-type="Sta_index2"><bold>17200&#x2013;17210</bold>: <named-content content-type="emSecs"><named-content content-type="ceb_256816646439br">1.47</named-content>, <named-content content-type="ceb_048717893609pd">2.19</named-content>, <named-content content-type="ceb_86117613396fo">3.69</named-content>, <named-content content-type="ceb_315271864877kv">9.4</named-content>, <named-content content-type="ceb_3571295329014io">24.36</named-content>, <named-content content-type="ceb_622152169547qs">28.29</named-content></named-content></p>
    <p content-type="Sta_index1"><bold>Title 18</bold></p>
    <p content-type="Sta_index2"><bold>337</bold>: <named-content content-type="emSecs"><named-content content-type="ceb_299721418847ei">24.15</named-content>, <named-content content-type="ceb_1071282833945ij">27.12</named-content></named-content></p>
    <p content-type="Sta_index2"><bold>337(c)</bold>: <named-content content-type="emSecs"><named-content content-type="ceb_485647382794if">21.25</named-content></named-content></p>
    <p content-type="Sta_index2"><bold>337.15</bold>: <named-content content-type="emSecs"><named-content content-type="ceb_044978926181bc">1.20</named-content></named-content></p>
    <p content-type="Sta_index2"><bold>790.03(h)(1)</bold>: <named-content content-type="emSecs"><named-content content-type="ceb_947708106972jn">9.10</named-content></named-content></p>
    <p content-type="Sta_index2"><bold>790.03(h)(1)&#x2013;(4)</bold>: <named-content content-type="emSecs"><named-content content-type="ceb_7581828949726mt">8.23</named-content></named-content></p>
    <p content-type="Sta_index2"><bold>790.03(h)(1)&#x2013;(13)</bold>: <named-content content-type="emSecs"><named-content content-type="ceb_7581828949726mt">8.23</named-content></named-content></p>
    <p content-type="Sta_index2"><bold>790.03(h)(2)</bold>: <named-content content-type="emSecs"><named-content content-type="ceb_947708106972jn">9.10</named-content>, <named-content content-type="ceb_014483150222fa">10.5</named-content></named-content></p>
    <p content-type="Sta_index1"><bold>Title 20</bold></p>
    <p content-type="Sta_index2"><bold>1.1</bold>: <named-content content-type="emSecs"><named-content content-type="ceb_381003477796sg">13.34</named-content></named-content></p>
    <p content-type="Sta_index2"><bold>1.1(c)</bold>: <named-content content-type="emSecs"><named-content content-type="ceb_381003477796sg">13.34</named-content></named-content></p>
    <p content-type="Sta_index2"><bold>1.4</bold>: <named-content content-type="emSecs"><named-content content-type="ceb_381003477796sg">13.34</named-content>, <named-content content-type="ceb_5704471494217hk">14.17</named-content></named-content></p>
    <p content-type="Sta_index2"><bold>1.4.1</bold>: <named-content content-type="emSecs"><named-content content-type="ceb_381003477796sg">13.34</named-content>, <named-content content-type="ceb_5704471494217hk">14.17</named-content></named-content></p>
    <p content-type="Sta_index2"><bold>1.16(b)(4)</bold>: <named-content content-type="emSecs"><named-content content-type="ceb_472031754764fn">12.36</named-content></named-content></p>
    <sec>
        <title>Title 1(a)</title>
        <p content-type="Sta_index2"><bold>4</bold>: <named-content content-type="emSecs"><named-content content-type="ceb_0504619109866mg">22.15</named-content></named-content></p>
        <p content-type="Sta_index2"><bold>4&#x2013;5</bold>: <named-content content-type="emSecs"><named-content content-type="ceb_0504619109866mg">22.15</named-content></named-content></p>
        <p content-type="Sta_index2"><bold>4(a)</bold>: <named-content content-type="emSecs"><named-content content-type="ceb_0504619109866mg">22.15</named-content></named-content></p>
        <p content-type="Sta_index2"><bold>4.1</bold>: <named-content content-type="emSecs"><named-content content-type="ceb_0504619109866mg">22.15</named-content></named-content></p>
    </sec>
</sec>
**Code Link: **(https://xsltfiddle.liberty-development.net/3NSTbfj/44)

Solution

  • I think using

    <xsl:template match="sec">
        <xsl:copy>
            <xsl:for-each-group select="*" group-adjacent=". instance of element(p) and @content-type = 'Sta_index2'">
                <xsl:choose>
                    <xsl:when test="current-grouping-key()">
                        <xsl:apply-templates select="sort(current-group(), 'http://saxon.sf.net/collation?alphanumeric=yes;ignore-symbols=no', function($p) { $p/bold[1] })"/>
                    </xsl:when>
                    <xsl:otherwise>
                        <xsl:apply-templates select="current-group()"/>
                    </xsl:otherwise>
                </xsl:choose>
            </xsl:for-each-group>
        </xsl:copy>
    </xsl:template>
    

    is closer to what you want, at least to restrict the grouping and sorting to the p elements with content-type="Sta_index2"; the resulting sort does not quite give the order you have posted, perhaps play with the different options the collation has or adjust the third argument to the sort function.