Search code examples
javaxmlxsltsax

Transform xml to cvs


i developed application for transform xml to cvs via xsl. I used DOM API,but it has bad performance ( in input i have 100000 xml size 200kb-20mb) i tried to use SAX API, but i receive wrong result output after transformation. Dom api:

@PostConstruct
public void init() throws ParserConfigurationException, TransformerConfigurationException {
    styleSheet = new File("1.xsl");
    builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
    stylesource = new StreamSource(styleSheet);
    transformer = TransformerFactory.newInstance().newTransformer(stylesource);
}

public String transformXmlToCsv(String inputXml) {
    String csv = null;

    try {
        InputSource is = new InputSource(new StringReader(inputXml));

        Document document = builder.parse(is);

        StringWriter writer = new StringWriter();

        transformer.transform(new DOMSource(document), new StreamResult(writer));
        csv = writer.toString();
        writer.close();
    } catch (Exception e) {
        LOGGER.error("Exception during transorming", e);
    }
    return csv;
}

}

SAX API:

 public static void main(String[] args) throws Exception {
    TransformerFactory transFact = TransformerFactory.newInstance( );
    File xml = new File("019dc124-5057-43f3-aa5d-1d840536b1b5-1558467374000.xml");
    File styleSheet = new File("1.xsl");
    Result outputTarget = new StreamResult(new File("C:\\proj\\xmlparser\\result.csv"));
    Source stylesource = new StreamSource(styleSheet);
    Transformer trans = transFact.newTransformer(stylesource);
    InputSource is = new InputSource(new FileReader(xml));   
    Source xmlSource = new SAXSource(is);
    trans.transform(xmlSource, outputTarget);       
}

Solution

  • I think you might just have run into a variant of the most frequently asked question on using XSLT properly to select elements in a namespace, I think the default Java DocumentBuilder is not namespace aware so your XSLT code might see elements in a default namespace as being in no namespace so that your paths like root/rootnode/name work. On the other hand, using Sax I think the XSLT processor will see elements in the default namespace you say you have and then your paths don't work anymore as they select elements in no namespace.

    To fix this, there are two ways: switch to XSLT 2/3 by putting Saxon 9 HE (latest version is 9.9) on the classpath and then use e.g. xpath-default-namespace="http://example.com/ns" as an attribute on the xsl:stylesheet or xsl:transform root element.

    Or, if you are stuck with XSLT 1, the only fix is to declare a prefix (e.g. pf1) for that namespace (e.g. http://example.com/ns) in the stylesheet e.g. xmlns:pf1="http://example.com/ns" and then change all XPath expresssions and match patterns to use the prefix so root/rootnode/name becomes pf1:object/pf1:rootnode/pf1:name.