I am trying to anonymize an XML Export of confluence. I found their export cleanner jar:
https://confluence.atlassian.com/doc/content-anonymizer-for-data-backups-134795.html
I have modified the clean.stx
to remove all users like this:
<stx:template match="object[@class='ConfluenceUserImpl']/property[@name='name']/text() | object[@class='ConfluenceUserImpl']/property[@name='lowerName']/text() | object[@class='ConfluenceUserImpl']/id[@name='key']/text() | property[@class='ConfluenceUserImpl']/id[@name='key']/text()">
<stx:value-of select="translate(., '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ', 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx')"/>
</stx:template>
I need to modify the CDATA also using regex or similar in order to remove user mentions in the body of a confluence page.
The CDATA looks like this e.g.:
<property name="body">
<![CDATA[
<p>
<ac:link>
<ri:user ri:userkey="8a8300716489cc7d016489ce009a0000" />
</ac:link>
</p>
]]>
</property>
Here I only need to replace the value of ri:userkey
to xxx or similar.
How can I do this?
Nevermind, i now use the joost java version of the stx which is newer than the one used by attlassian in their jar: http://joost.sourceforge.net/
I can use replace() here and use stx:cdata to disable escaping:
<stx:template match="property[@name='body']/cdata()">
<stx:cdata>
<stx:value-of select="replace(., '(ri:userkey=).*?\s', '$1"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" ')" />
</stx:cdata>
</stx:template>