Search code examples
javapentahoetlkettle

Pentaho Data Integration transformation, loading fields from csv file (Java API)


I am trying to create simple transformation using Kettle Java API.

Just two blocks, one to read from csv file and the other to write it into text file.

Code:

PluginRegistry.addPluginType(SpoonPluginType.getInstance());
        PluginRegistry.addPluginType(StepPluginType.getInstance());
        PluginRegistry.init();

        TransMeta transMeta = new TransMeta();
        transMeta.setName("testTrans");


        String csvStep = "read from file ";
        CsvInputMeta csvInputMeta = new CsvInputMeta();
        csvInputMeta.setDefault();
        csvInputMeta.setFilename(INPUT_FILE);
        csvInputMeta.setDelimiter(";");


        String csvId = PluginRegistry.getInstance().getPluginId(csvInputMeta);
        StepMeta stepMeta = new StepMeta(csvId, csvStep, csvInputMeta);
        transMeta.addStep(stepMeta);


        TextFileOutputMeta textFileOutputMeta = new TextFileOutputMeta();
        textFileOutputMeta.setDefault();
        textFileOutputMeta.setFilename(OUTPUT_FILE);
        textFileOutputMeta.setFileFormat("txt");

        String outPutStep = "Output step";
        String outputId = PluginRegistry.getInstance().getPluginId(textFileOutputMeta);
        StepMeta stepMeta2 = new StepMeta(outputId, outPutStep, textFileOutputMeta);
        transMeta.addStep(stepMeta2);

        transMeta.addTransHop(new TransHopMeta(stepMeta, stepMeta2));
        transMeta.setName("testTrans");

        String xml = transMeta.getXML();
        DataOutputStream dos = new DataOutputStream(new FileOutputStream(new File(trans.xml)));
        dos.write(xml.getBytes("UTF-8"));
        dos.close();

        Trans trans = new Trans(transMeta);
        trans.execute(null);
        trans.waitUntilFinished();

When I run above code the output is:

INFO  18-09 17:32:08,700 - read from file  - Line number : 50000
INFO  18-09 17:32:08,703 - Output step - linenr 50000
INFO  18-09 17:32:09,147 - read from file  - Line number : 100000
INFO  18-09 17:32:09,149 - Output step - linenr 100000
INFO  18-09 17:32:09,491 - read from file  - Line number : 150000
INFO  18-09 17:32:09,492 - Output step - linenr 150000
INFO  18-09 17:32:09,786 - read from file  - Line number : 200000
INFO  18-09 17:32:09,788 - Output step - linenr 200000

and so on. But my csv file actually contains 4 rows thats look like that:

id;val
1;10
2;15
3;20

The problem is transformation "doesn't know" what the fields are. When I exported transformation into xml file, loaded it into Pentaho Spoon and pressed "Get fields" button everything worked correctly (only 3 rows was read).

I know I can just manually create these fields and set them into csvInputMeta but is there a way to do this automatically just like button "Get fields" in Spoon does?


Solution

  • If anyone is curious, I found a solution.

    You have to use your own csv reader...

    But you can get some help in class CsvInputDialog (its GUI class). There are methods like getCsv and getInfo, those are private so you can't use them directly but you can use them to write your own method. Then as @Dirk said use setInputFields method.

    Or you can find some ready csv parser.