Search code examples
xslt-2.0

group where text() contains a keyword


I have a large text file that at some point looks like this :

<w>
    <randomnode>
        <pa/>
        <pa>
            Keyword1 sometxt1 <thing>abc</thing>: blabla
        </pa>
        <stuff>abc</stuff>
        <pa>
            just blabla
        </pa>
        <pa>
            Keyword2 othertxt2: blabla
        </pa>
        <pa>
            just blabla
        </pa>
        <pa>
            just blabla
        </pa>
        <pa>
            Keyword1 xxx: and blabla
        </pa>
    </randomnode>
</w>

and want to get this result:

<w>
    <randomnode>
        <k attr="keyword1 sometxt1">
            <p>
                <s>
                    Keyword1 sometxt1 <thing>abc</thing>:
                </s>
                blabla
            </p> 
            <stuff>abc</stuff>
            <p>
                just blabla
            </p>
        </k>
        <k attr="keyword1 othertxt2">
            <p>
                <s>
                Keyword2 othertxt2:
                </s>
                blabla
            </p>
            <p>
                just blabla
            </p>
            <p>
                just blabla
            </p>
        </k>
        <k attr="keyword1 xxx">
            <p>
                <s>
                    Keyword1 xxx:
                </s>
                and blabla
            </p>
       </k>
    </randomnode>
</w>

In English: I want to go through each <pa> and group them whenever there is a keyword1 or keyword2 or keyword3 in the text() of that node. The splitting-up for the content of the <s> on the : is done in another template and should work once I can group the <pa> correctly.

I have this so far:

<xsl:for-each-group select="$randomnode/*[normalize-space(.)!='']"
     group-starting-with="pa/text()[contains(., 'keyword1')
     or contains(., 'keyword2') or contains(., 'keyword3')]">

The problem ist that nothing is selected and I have a feeling it is because of text()...

Can I use group-starting-woth on text() at all? I would really like to use this and extend/correct it before I do something completly different..


Solution

  • Well I would ditch the text() completely and simply compare

    <xsl:for-each-group select="$randomnode/*[normalize-space(.)!='']"
         group-starting-with="pa[contains(., 'keyword1')
         or contains(., 'keyword2') or contains(., 'keyword3')]">
    

    If you want to use the text() child node selection then you need

    <xsl:for-each-group select="$randomnode/*[normalize-space(.)!='']"
         group-starting-with="pa[text()[contains(., 'keyword1')
         or contains(., 'keyword2') or contains(., 'keyword3')]]">