Search code examples
jsonfieldextracttalend

Talend extract JSON with a strange format


I have a problem with Talend: I have to extract a very strange JSON format, it looks like:

{"results":[{"id":0,"series":[{"name":"table1","columns":["column1","column2","column3","column4"],"values":[["Value1","Value2","Value3","Value4"],["Value1","Value2","Value3","Value4"],["Value1","Value2","Value3","Value4"],["Value1","Value2","Value3","Value4"],["Value1","Value2","Value3","Value4"]]}]}]}

in practice we have that inside the "series" object we have the "columns" object with the name of the various columns and the "values" object with the values of the various rows. The desired output would be a table/csv/json that has a more normal format, so field and value. Does anyone know how I could do this? So far I have tried extracting the various JSON fields but the output is as follows:

Columns
Column1
Column2
Column3
Column4
values
["Value1","Value2","Value3","Value4"]
["Value1","Value2","Value3","Value4"]
["Value1","Value2","Value3","Value4"]
["Value1","Value2","Value3","Value4"]

(For this one probably I have to extract another JSON field, I thought).

Thanks to all

PS. I added Talend in the post


Solution

  • Here's a solution to get the results as a csv file.
    enter image description here

    I used tFixedFlowInput_1 and tFixedFlowInput_3 as an input with the json from your example.
    tExtractJSONFields_1 extracts the individual columns from the columns array, then it's denormalized into a file.
    enter image description here

    tExtractJSONFields_2 extracts the values as arrays, then for each one, we extract the individual values using tExtractJSONFields_3, and we denormalize each set of values to get a csv row in tFileOutputDelimited_3 (writing to the previous file in append mode).
    enter image description here enter image description here

    The final result looks like this:

    column1,column2,column3,column4
    Value1,Value2,Value3,Value4
    Value1,Value2,Value3,Value4
    Value1,Value2,Value3,Value4
    Value1,Value2,Value3,Value4
    Value1,Value2,Value3,Value4
    

    I used the comma as a separator, can be changed in tDenormalize_1 and tDenormalize_2