Search code examples
csvindexingconstraintshsqldbintegrity

HSQLDB: weird "unique constraint or index violation" with data read from CSV


I have a tool which reads a CSV file, selects from it using HSQLDB, and saves the result as another CSV file. More here: http://ondra.zizka.cz/stranky/programovani/java/apps/CsvCruncher-csv-manipulation-sql.texy

Now when I used it for some task, I have got:

java -jar CsvCruncher-1.0.jar result.csv foo.csv 'SELECT * FROM indata'

INFO:   SQL: CREATE TEXT TABLE indata ( xrelease VARCHAR(255), xtype VARCHAR(255), xartifact VARCHAR(255), xversion VARCHAR(255) )
INFO:   SQL: CREATE TEXT TABLE output ( XRELEASE VARCHAR(255), XTYPE VARCHAR(255), XARTIFACT VARCHAR(255), XVERSION VARCHAR(255) )
INFO:   User's SQL: INSERT INTO output (SELECT * FROM indata)
INFO: Database closed
Exception in thread "main" java.sql.SQLException: integrity constraint violation: unique constraint or index violation: SYS_IDX_10027
    at org.hsqldb.jdbc.Util.sqlException(Unknown Source)
    at org.hsqldb.jdbc.JDBCPreparedStatement.fetchResult(Unknown Source)
    at org.hsqldb.jdbc.JDBCPreparedStatement.executeUpdate(Unknown Source)
    at org.jboss.qa.cvscruncher.Cruncher.crunch(Cruncher.java:187)
    at org.jboss.qa.cvscruncher.App.main(App.java:26)
    at Crunch.main(Crunch.java:9)
Caused by: org.hsqldb.HsqlException: integrity constraint violation: unique constraint or index violation: SYS_IDX_10027
    at org.hsqldb.error.Error.error(Unknown Source)
    at org.hsqldb.index.IndexAVL.insert(Unknown Source)
    at org.hsqldb.persist.RowStoreAVLDiskData.indexRow(Unknown Source)
    at org.hsqldb.Table.insertSingleRow(Unknown Source)
    at org.hsqldb.StatementDML.insertRowSet(Unknown Source)
    at org.hsqldb.StatementInsert.getResult(Unknown Source)
    at org.hsqldb.StatementDMQL.execute(Unknown Source)
    at org.hsqldb.Session.executeCompiledStatement(Unknown Source)
    at org.hsqldb.Session.execute(Unknown Source)
    ... 5 more

As can be seen from the log, there are no indexes created, at least not explicitely. I have tried to find some auto-created constraint in HSQLDB's manual, didn't find.

When I do only SELECT 1 FROM indata, it's fine. So I guess it's something with the data itself. For that case, here they are: http://pastebin.com/8QiY2HXx (x to prevent keyword clash).

Update:

When I dump the data read from the CSV, it's a bit weird:

 -------
 XRELEASE: 5.1.0-SNAPSHOT
 XTYPE: DEP
 XARTIFACT: org.apache.maven:maven-ant-tasks
5XVERSION: 2.0.9
 -------
 XRELEASE: 5.1.0-SNAPSHOT
 XTYPE: DEP
 XARTIFACT: org.jboss.seam.integration:jboss-seam-int-microcontainer
5XVERSION: 5.1.0.CR1
 -------
 XRELEASE: 5.1.0-SNAPSHOT
 XTYPE: DEP
 XARTIFACT: org.jboss.seam.integration:jboss-seam-int-jbossas
5XVERSION: 5.1.0.CR1
 -------
...

Which seems like the xversion column is modified somehow. The code is simply System.out.println(" "+ metaData.getColumnLabel(i) + ": "+ rs.getObject(i) );

Any idea what can cause this?

Thanks, Ondra


Solution

  • The problem was in the input file - it contained 0x0D as newlines, and somehow it broke HSQLDB. I'll report that so they can check. At least it should refuse invalid input or better transform the newlines.