I need to use Kettle/PDI community version to read big fixed length data files and do some ETL stuff on them. During development stage I faced following issue:
Kettle plugin "Fixed File Input" allows multiple data types with remark they are actually Strings or byte arrays.
My input contained both: Strings and byte arrays corresponding to Little Endian representation of long, int and short (Intel specific endian-ness). Example of record structure to be read: Column1(char:8), Column2(long:8 hex), Column3(char:2),Column4(int:4 hex).
I tried to use "Select Values" plugin and change Binary type of column to Integer but such method is not implemented. Finaly I ended with following solution:
As you can see I used a formula to obtain long value.
public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException
{
Object[] r = getRow();
if (r == null) {
setOutputDone();
return false;
}
// It is always safest to call createOutputRow() to ensure that your output row's Object[] is large
// enough to handle any new fields you are creating in this step.
r = createOutputRow(r, data.outputRowMeta.size());
// Get the value from an input field
byte[] buf;
long longValue;
// BAN_L - 8 bytes
buf= get(Fields.In, "BAN").getBinary(r);
longValue= ((buf[0] & 0xFFL) << 0) | ((buf[1] & 0xFFL) << 8)
| ((buf[2] & 0xFFL) << 16) | ((buf[3] & 0xFFL) << 24)
| ((buf[4] & 0xFFL) << 32) | ((buf[5] & 0xFFL) << 40)
| ((buf[6] & 0xFFL) << 48) | ((buf[7] & 0xFFL) << 56);
get(Fields.Out, "BAN_L").setValue(r, longValue);
//DEPOSIT_PAID_AMT -4 bytes
buf = get(Fields.In, "DEPOSIT_PAID_AMT").getBinary(r);
longValue= ((buf[0] & 0xFFL) << 0) | ((buf[1] & 0xFFL) << 8)
| ((buf[2] & 0xFFL) << 16) | ((buf[3] & 0xFFL) << 24);
get(Fields.Out, "DEPOSIT_PAID_AMT_L").setValue(r, longValue);
//BILL_SEQ_NO_L -2 bytes
buf = get(Fields.In, "BILL_SEQ_NO").getBinary(r);
longValue = ((buf[0] & 0xFFL) << 0) | ((buf[1] & 0xFFL) << 8);
get(Fields.Out, "BILL_SEQ_NO_L").setValue(r, longValue);
// Send the row on to the next step.
putRow(data.outputRowMeta, r);
//binaryToDecimal();
return true;
}
Problem arise when I have in one data extracts 8-20 binary fields. Is there any alternative to this approach so I can call something like:
getNumberFromLE(byte [] buff, buff.length);
Is there any other plugin in development which can be used to transform byte[] to Pentaho Kettle "Number" data type? (BigNumber and Integer are also good).
I found following possibilities:
1) it is possible to add additional types to ValueMetaInterface class:
org.pentaho.di.core.row.ValueMetaInterface
and add conversion functions into
org.pentaho.di.core.row.ValueMeta
2) add code snippet implementation getNumberFromLE to "Common use" Code snippits of "User Defined Java Class"
3) add as plugin new data types as described in bellow two links: Jira pluggable types GitHub pdi-valuemeta-map AddingDataTypes