Search code examples
kettlepentaho-spoon

Spoon Kettle doesn't manage NULL values correctly


I'm using Spoon Kettle PDI to insert data from a csv file to a MariaDb database.

I'm doing something very simple but apparently, when in the csv there is a NULL value, PDI interpret it as a String and this create problems in the final query:

018/04/25 14:31:23 - Workstation.0 - ERROR (version 7.1.0.0-12, build 1 from 2017-05-16 17.18.02 by buildguy) : Because of an error, this step can't continue:
2018/04/25 14:31:23 - Workstation.0 - ERROR (version 7.1.0.0-12, build 1 from 2017-05-16 17.18.02 by buildguy) : org.pentaho.di.core.exception.KettleValueException: 
2018/04/25 14:31:23 - Workstation.0 - Unexpected conversion error while converting value [checkPoint_id String] to an Integer
2018/04/25 14:31:23 - Workstation.0 - 
2018/04/25 14:31:23 - Workstation.0 - checkPoint_id String : couldn't convert String to Integer
2018/04/25 14:31:23 - Workstation.0 - 
2018/04/25 14:31:23 - Workstation.0 - checkPoint_id String : couldn't convert String to number : non-numeric character found at position 1 for value [NULL]
2018/04/25 14:31:23 - Workstation.0 - 
2018/04/25 14:31:23 - Workstation.0 - 
2018/04/25 14:31:23 - Workstation.0 - 
2018/04/25 14:31:23 - Workstation.0 -   at org.pentaho.di.core.row.value.ValueMetaBase.getInteger(ValueMetaBase.java:2081)
2018/04/25 14:31:23 - Workstation.0 -   at org.pentaho.di.core.row.value.ValueMetaBase.convertData(ValueMetaBase.java:3785)
2018/04/25 14:31:23 - Workstation.0 -   at org.pentaho.di.core.row.value.ValueMetaBase.convertBinaryStringToNativeType(ValueMetaBase.java:1579)
2018/04/25 14:31:23 - Workstation.0 -   at org.pentaho.di.core.row.value.ValueMetaBase.getString(ValueMetaBase.java:1799)
2018/04/25 14:31:23 - Workstation.0 -   at org.pentaho.di.core.row.RowMeta.getString(RowMeta.java:319)
2018/04/25 14:31:23 - Workstation.0 -   at org.pentaho.di.core.row.RowMeta.getString(RowMeta.java:828)
2018/04/25 14:31:23 - Workstation.0 -   at org.pentaho.di.trans.steps.tableoutput.TableOutput.writeToTable(TableOutput.java:385)
2018/04/25 14:31:23 - Workstation.0 -   at org.pentaho.di.trans.steps.tableoutput.TableOutput.processRow(TableOutput.java:125)
2018/04/25 14:31:23 - Workstation.0 -   at org.pentaho.di.trans.step.RunThread.run(RunThread.java:62)
2018/04/25 14:31:23 - Workstation.0 -   at java.lang.Thread.run(Thread.java:748)
2018/04/25 14:31:23 - Workstation.0 - Caused by: org.pentaho.di.core.exception.KettleValueException: 
2018/04/25 14:31:23 - Workstation.0 - checkPoint_id String : couldn't convert String to Integer
2018/04/25 14:31:23 - Workstation.0 - 
2018/04/25 14:31:23 - Workstation.0 - checkPoint_id String : couldn't convert String to number : non-numeric character found at position 1 for value [NULL]
2018/04/25 14:31:23 - Workstation.0 - 
2018/04/25 14:31:23 - Workstation.0 - 
2018/04/25 14:31:23 - Workstation.0 -   at org.pentaho.di.core.row.value.ValueMetaBase.convertStringToInteger(ValueMetaBase.java:1323)
2018/04/25 14:31:23 - Workstation.0 -   at org.pentaho.di.core.row.value.ValueMetaBase.getInteger(ValueMetaBase.java:2019)
2018/04/25 14:31:23 - Workstation.0 -   ... 9 more
2018/04/25 14:31:23 - Workstation.0 - Caused by: org.pentaho.di.core.exception.KettleValueException: 
2018/04/25 14:31:23 - Workstation.0 - checkPoint_id String : couldn't convert String to number : non-numeric character found at position 1 for value [NULL]
2018/04/25 14:31:23 - Workstation.0 - 
2018/04/25 14:31:23 - Workstation.0 -   at org.pentaho.di.core.row.value.ValueMetaBase.convertStringToInteger(ValueMetaBase.java:1317)
2018/04/25 14:31:23 - Workstation.0 -   ... 10 more

In the image you can see the import from csv. I have to specify the type of each column. The exception is related to the column checkPoint_id that is a number but can be null.

enter image description here

Is there a way to overcome this problem? It seems a quite basic operation and but I don't see any option I could turn on to fix this behaviour.


Solution

  • Uncheck the Lasy conversion.

    If the problem persist, there is no standard for nulls in CSV, and it may well be that in your case, it uses "null" (a String).

    If it's for one go, open the CSV file in an editor and make a global search&replace "null" with "".

    If you have to automate or have a lot of CSV files, read all the field as String; then use the Null if... step to convert "null" to NULL; then change the data type with a Select value step on the Metadata tab.