Search code examples
pentahobusiness-intelligencepentaho-cdepentaho-spoonpentaho-data-integration

How to Import the Contents / Data in a Text file with One field using Pentaho Kettle?


I want to Parse the data which is Present In an Unstructured Text File. but Before that, I want the entire contents in that text file to store it into One field. So that, I can be able to parse the Data by Retrieving from the field.

I am Planning to Use Javascript Modified Values for parsing.

Note:
The file which I am Talking about is not a Normal Text or CSV File. It is a direct file which is received from Tandem Server.
eg: (The contents in the text file)
'|08-Jul-16|1| 5996|W2266001|BODHAN ROADNIZAMABAD|FNFA|5211080013438979|*****************|0220|01|7|07-Jul-16|08-Jul-16|23:14:23|1043|000|00|541100|30000|0000|PRO1|FNFA|00000403362|356|356|0|NIZ-220|NIZAMABAD|TS|IN||08-Jul-16|1|'


Solution

  • You can use the "Load file content in memory" step for that.

    It does exactly what you want, read file(s) into a single field per file without parsing. Rather than specifying a delimiter, you manually enter the field data. Choose "file content" as the element and "string" as the type.

    I have seen your other question. If your file is 1.7 GB in size, it will probably not load into a single row without memory issues.

    I tested using:

    transformation steps to split unstructured file

    1. Load file to memory, as described above
    2. Split field to rows, using "\|DR\|" as a regex delimiter
    3. Select values, to get rid of the original huge field
    4. Split fields, using delimiter "|" into a list of string fields (as many as your records can have at most).

    This gives somewhat usable records, but you may have to do more processing to identify missing columns in some records and straighten them out.