Search code examples
javaxsltcdataconfluencestx

Java STX CDATA parsing


I am trying to anonymize an XML Export of confluence. I found their export cleanner jar:

https://confluence.atlassian.com/doc/content-anonymizer-for-data-backups-134795.html

I have modified the clean.stx to remove all users like this:

<stx:template match="object[@class='ConfluenceUserImpl']/property[@name='name']/text() | object[@class='ConfluenceUserImpl']/property[@name='lowerName']/text() | object[@class='ConfluenceUserImpl']/id[@name='key']/text() | property[@class='ConfluenceUserImpl']/id[@name='key']/text()">
    <stx:value-of select="translate(., '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ', 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx')"/>
</stx:template>

I need to modify the CDATA also using regex or similar in order to remove user mentions in the body of a confluence page.

The CDATA looks like this e.g.:

<property name="body">
    <![CDATA[
        <p>
            <ac:link>
                <ri:user ri:userkey="8a8300716489cc7d016489ce009a0000" />
            </ac:link>
        </p>
    ]]>
</property>

Here I only need to replace the value of ri:userkey to xxx or similar.

How can I do this?


Solution

  • Nevermind, i now use the joost java version of the stx which is newer than the one used by attlassian in their jar: http://joost.sourceforge.net/

    I can use replace() here and use stx:cdata to disable escaping:

        <stx:template match="property[@name='body']/cdata()">
        <stx:cdata>
            <stx:value-of select="replace(., '(ri:userkey=).*?\s', '$1&quot;xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx&quot; ')" />
        </stx:cdata>
    </stx:template>