Search code examples
transformationetlpentahokettle

How to split values of an column in Pentaho Spoon?


I want to create a Spoon transformation which will work on multiple values of an column. Input to my transformation is an CSV file. In that CSV file there's one column named 'Technology' which contains 0 or more values seperated by semi colon as follows.

+------------------------------------------------------+

 row_id |   Technology
+------------------------------------------------------+

1       | Cobol ; Db2 ; Jcl ; Vsam ; Cics ; Changeman ;

2       | Oracle ; Sql ; Db2 ; Oracle 9i ;

3       | Windows 2000 ; SQL ;
+------------------------------------------------------+

I have one table in database named 'Technologies' and its schema is as follows :

+----------------------+

Technologies

+----------------------+
 id   | technology_name

+----------------------+

where id column is set to auto increment.

I want to insert values of technology column only if that value is not present in Technologies table.

Can anyone please tell me

1) Which type of step to be used to split values of technology column? 2) How to insert value only once? For example in row 1 and row 2, Db2 is repeated but I want to insert Db2 only once.

Thanks in advance !


Solution

  • Use the "Split Fields" (Under "Transform") to split the contents.

    CSV file input --> Split Fields --> rest of transformation

    Set the "Field to Split" to "Technology" and set the "Delimiter" to a semi-colon.

    Regarding the non-repeating field, my suggestion would be you make the name itself the key to the table. Shift it to lower-case and replace any spaces / special characters with database safe equiv's and then make that the primary key. You should end up with a table full of only unique instances.

    hth