Search code examples
javagroovyexiexificient

Encoding problems while extracting EXI-compressed XML


The code below is an attempt to simplify the setup required to perform EXI compression and decompression using EXIficient

class ExiCompressionUtils {
    static Transformer transformer = TransformerFactory.newInstance().newTransformer()

    static byte[] compress(String xml) {
        ByteArrayOutputStream exiOS = new ByteArrayOutputStream()
        EXIResult exiResult = new EXIResult(outputStream : exiOS)

        XMLReader xmlReader = XMLReaderFactory.createXMLReader()
        xmlReader.contentHandler = exiResult.handler
        xmlReader.parse(new InputSource(new StringReader(xml)))

        def compressed = exiOS.toByteArray()
        exiOS.close()
        return compressed
    }

    static String extract(byte[] compressed) {
        SAXSource exiSource = new SAXSource(new InputSource(new ByteArrayInputStream(compressed)))
        exiSource.setXMLReader(exiSource.reader)

        ByteArrayOutputStream exiOS = new ByteArrayOutputStream()
        transformer.transform(exiSource, new StreamResult(exiOS))  // fails here
        def extracted = exiOS.toString()
        exiOS.close()
        return compressed
    }
}

The below test fails with ERROR: 'Invalid byte 1 of 1-byte UTF-8 sequence.'

@Test
void testExiCompression() {
    def xml = '<Root><Child id="1">Text</Child><EmptyTag/></Root>'
    def compressed = ExiCompressionUtils.compress(xml)
    assert ExiCompressionUtils.extract(compressed) == xml
} 

Any encoding experts out there that can get to the bottom of this?


Solution

  • Today I struggled over this comment. There is one important issue with this code (besides the strange syntax for Java missing semicolons etc.)

    When reading use EXISource and not SAXSource!

    Attached the piece of code that works.

    -- Daniel

    static Transformer transformer;
    
    static {
        try {
            transformer = TransformerFactory.newInstance().newTransformer();
        } catch (TransformerConfigurationException e) {
        } catch (TransformerFactoryConfigurationError e) {
        }
    }
    
    static byte[] compress(String xml) throws IOException, EXIException,
            SAXException {
        ByteArrayOutputStream exiOS = new ByteArrayOutputStream();
        EXIResult exiResult = new EXIResult();
        exiResult.setOutputStream(exiOS);
    
        XMLReader xmlReader = XMLReaderFactory.createXMLReader();
        xmlReader.setContentHandler(exiResult.getHandler());
        xmlReader.parse(new InputSource(new StringReader(xml)));
    
        byte[] compressed = exiOS.toByteArray();
        exiOS.close();
    
        return compressed;
    }
    
    static String extract(byte[] compressed) throws TransformerException,
            IOException, EXIException {
        // SAXSource exiSource = new SAXSource(new InputSource(new
        // ByteArrayInputStream(compressed))); // use EXISource instead!
        SAXSource exiSource = new EXISource();
        exiSource.setInputSource(new InputSource(new ByteArrayInputStream(
                compressed)));
    
        ByteArrayOutputStream exiOS = new ByteArrayOutputStream();
        transformer.transform(exiSource, new StreamResult(exiOS));
        String extracted = exiOS.toString();
        exiOS.close();
        return extracted;
    }
    
    public static void main(String[] args) throws IOException, EXIException,
            SAXException, TransformerException {
        String xml = "<Root><Child id=\"1\">Text</Child><EmptyTag/></Root>";
        byte[] compressed = ExiCompressionUtils.compress(xml);
        System.out.println(ExiCompressionUtils.extract(compressed));
    }