Search code examples
oracle-databaseetl

Why do I get 'ORA-12899: value too large for column' error when copying data between columns of the same type/width?


I have a small java program that copies data from one database to another. Both databases have the same columns and the copy was created using the sql developer "copy database" option to create all the objects. Both databases are oracle 12.2.0 enterprise but the destination is in a docker image and will be used for development. I'm getting a weird error when copying just one table:

ORA-12899: value too large for column "MY_SCHEMA"."MY_TABLE"."MY_COLUMN" (actual: 101, maximum: 100)

Both databases have the same width on this column so not sure why its complaining. Both are VARCHAR2(100). Maybe its something wrong with my java code? Essentially its doing this:

  List<Column> cols = getCols("MY_SCHEMA", "MY_TABLE",sourceConn.getMetaData());
  String sourceTableQuery = ...; //select * from my_table
  String destinationTableInsertQuery = ...;//insert into my_table(...) values(...)
  PreparedStatement queryStmt = sourceConn.prepareStatement(sourceTableQuery); 
  ResultSet data = queryStmt.executeQuery();
  PreparedStatement insertStmt = destionationConn.prepareStatement(destinationTableInsertQuery); 

  while (data.next()) {
      for(int i = 0; i < cols.size(); i++) {
         insertStmt.setObject(i+1, data.getObject(cols.get(i).getName())); 
      }
      insertStmt.addBatch();

  }
  int [] results = insertStmt.executeBatch();

Edit: the exact data type is VARCHAR2(100 BYTE) on both columns. Sorry if that caused any confusion.


Solution

  • By default, if you declare a column as varchar2(100), you are allocating 100 bytes of storage. Depending on the data and the database character set, 1 character may take 1 byte of storage, 2 bytes, 3 bytes, or 4 bytes. So that varchar2(100) column can store somewhere between 25 and 100 characters depending on the data and database character set.

    I'd wager that the database character set (nls_characterset in v$nls_parameters) is different in the two databases. My guess is that the destination database uses a UTF-8 character set which is variable length and the source database uses a fixed width character set. In the row that fails, there is at least one character that requires 2 bytes of storage in the UTF-8 character set and only 1 byte of storage in the source database's character set. (Note that the trade-off is that the UTF-8 character set supports a much wider range of characters than the source database's character set.)

    You could declare the column as varchar2(100 CHAR) in both databases or at least in the destination database (using character length semantics rather than byte length semantics). That would allocate space for 100 characters regardless of the number of bytes that data requires (note that limits on the length of varchar2 columns are still expressed in bytes so this doesn't work with, say, varchar2(4000) columns unless you've enabled extended string sizes).

    If you have all the DDL in one place and that DDL doesn't specify character or byte semantics, you could do an

    alter session set nls_length_semantics = CHAR
    

    before running the DDL to create your objects in the source database if you don't want to go through and edit the DDL.