Search code examples
javaxmlodfodt

Convert ODT to single XML file


I know that standards define two versions of ODT file: - one is a archive of different files, i.e. meta.xml, content.xml etc, - second is one big XML file with all the data. (I know above from http://en.wikipedia.org/wiki/OpenDocument_technical_specification#Document_Representation)

The latter version is better for processing, but unfortunately is not produced by OpenOffice.

The question is: Do you know any filter, converter, or anything what would help me transform ODT file in archive version into single XML file? The best would be a Java class.


Solution

  • I solved the case by producing XSLT stylesheet that transforms ODT source files into one XML file "more or less" compatible with the standard. Below is the code.

    <?xml version="1.0" encoding="UTF-8"?>
        <xsl:stylesheet version="1.0"
            xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0">
    
            <xsl:param name="meta.file" select="'meta.xml'" /> 
    
            <xsl:template match="@*|node()">
                <xsl:copy>
                    <xsl:apply-templates select="@*|node()" />
                </xsl:copy>
            </xsl:template>
    
            <xsl:template match="office:document-content">
                <office:document>
                    <xsl:copy-of select="@*" />
                    <xsl:variable name="meta" select="document($meta.file)/office:document-meta/office:meta" />
                    <xsl:copy-of select="$meta" />
                    <xsl:apply-templates />
                </office:document>
            </xsl:template>
    
        </xsl:stylesheet>