Search code examples
regexxslt-2.0string-function

Split string following a pattern using XSLT 2.0


I have a string that needs to be parsed using XSLT 2.0

Input string

Hoffmann, Rüdiger (Universtiy-A, SomeCity, (SomeCountry); University-B, SomeCity, (SomeCountry)); Author, X; Author, B. (University-C, SomeCity (SomeCountry))

Expected output
Hoffmann, Rüdiger (Universtiy-A, SomeCity, (SomeCountry); University-B, SomeCity, (SomeCountry))
Author, X
Author, B. (University-C, SomeCity (SomeCountry))

The structure is - author name, followed by his university. But, one author could have two universities. And the delimiter between universities and between two sets of author is the same one. (semi-colon in this case).

I need to split it based on the delimiter for author-affiliation group, ignoring the semicolon between affiliations.

I believe it can be done with the help of regex, but I have not much experience building regex myself.


Solution

  • As long as the parentheses around the list of universities and around the country are always present you could match on them:

    <?xml version="1.0" encoding="UTF-8" ?>
    <xsl:transform
        xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
        version="2.0"
        xmlns:xs="http://www.w3.org/2001/XMLSchema"
        xmlns:mf="http://example.com/mf"
        exclude-result-prefixes="xs mf">
    
        <xsl:output method="text"/>
        <xsl:param name="authors">Author, A. (Universtiy-A, SomeCity, (SomeCountry); University-B, SomeCity, (SomeCountry));Author, B. (University-C, SomeCity (SomeCountry))</xsl:param>
    
        <xsl:template match="/">
            <xsl:value-of select="mf:split($authors)" separator="&#10;"/>
        </xsl:template>
    
        <xsl:function name="mf:split" as="xs:string*">
            <xsl:param name="input" as="xs:string"/>
            <xsl:analyze-string select="$input" regex="[^;)]*?\([^(]*?\([^(]*?\)\)">
                <xsl:matching-substring>
                    <xsl:sequence select="."/>
                </xsl:matching-substring>
            </xsl:analyze-string>
        </xsl:function>
    </xsl:transform>