Search code examples
javaxmlpostgresqljdbclibxml2

Working with large XML files in Postgresql


I am using Java JDBC to insert a rather large XML file (about 32MB) into an "xml" type column in Postgres using this particular approach / method:

public void setDataXML(PreparedStatement ps, Connection conn, int index, byte[] bytes) throws SQLException {
    SQLXML sqlxml = conn.createSQLXML();
    OutputStream os = sqlxml.setBinaryStream();
    try {
        os.write(bytes);
    } catch (IOException e) {
        throw new SQLException(e);
    }
    ps.setSQLXML(index, sqlxml);
}

I am currently using the postgresql-9.4.1208.jar JDBC driver for this operation. Previously this was working just fine when running against a Postgres 9.0.4 database. I have since upgraded to a Postgres 9.4 database, and now this operation is failing with a:

    PreparedStatementCallback; SQL []; ERROR: invalid XML content
Detail: line 418061: internal error: Huge input lookup
nested exception is org.postgresql.util.PSQLException: ERROR: invalid XML content

I have tested this on Windows as well as an Amazon Linux instance, and the results are the same. I know it has something to do with the size of the XML file, as severely reducing the sheer size of the XML file allows it to be inserted as desired.

What settings can I use on the driver / database, or how to configure as such to allow me to insert a large XML file with Postgres 9.4?


Solution

  • Apparently this 'error' stems from the underlying xml library used to handle the XML parsing and processing (e.g. libxml/libxml2) and this kind of error can be found for a number of languages using the library - e.g. Python, R, ... (Postgres)

    The code does not show how Java is building the prepared statement, but, there is two ways of parsing XML in Postgres (PostgresDoc):

    XMLPARSE (DOCUMENT '<?xml version="1.0"?><book><title>Manual</title><chpt>...</chpt></book>')
    XMLPARSE (CONTENT 'abc<foo>bar</foo><bar>foo</bar>')
    

    The second one is the default and might expect less data than the second one (also used when doing casting with :: syntax like so '<a>1</a>'::xml ).

    • Option 1: Explicitly construct your statement and explicitly use XMLPARSE (DOCUMENT ...)
    • Option 2: Make the DOCUMENT the default with
      • SET XML OPTION DOCUMENT; or
      • SET xmloption TO DOCUMENT;