Search code examples
javajsontalend

talend: Merge multiple (complex) flat files into single JSON file


I'm testing talend for its potential use in a project - basic tasks are completed easily, however I'm struggling with the following sitution:

We have multiple flat files, all of which combine to describe various items. For my testing, I would simply like to merge two of these files (for now) into a JSON format. The catch here is that one of the files contains 1 or more rows per item;

For example:

File 1: id, category

1, A   
2, A  
3, B

File 2: id, language, colour

1, en_GB, Red  
1, de_DE, Rot  
2, en_GB, Blue  
3, en_GB, Green  
3, de_DE, Grün  
3, es_ES, Verde 

The result should look something like this:

{  
  items[{  
    "id": 1,  
    "category": "A",  
    "colours": [{  
      "language": "en_GB",  
      "colour": "Red"  
     }, {  
      "language": "de_DE",  
      "colour": "Rot"  
    }],  
  },  
...  
}

What I have tried so far is:

tMap to merge the files/rows together, then tAggregate to group by the id's. This does not quite work, as it results in the language and colour attributes being formatted individually as comma separated lists:

ie.

"language": "en_GB, de_DE",  
"colour": "Red, Rot"

This is not what we require.

Is it possible to achieve what we need in talend? If so, how?


Solution

  • Here's a solution I put together, using java json library, since json components do not handle such complex structure.

    enter image description here

    tAggregateRow settings:

    enter image description here

    First, load the json-java.jar using a tLibraryLoad. Then join data using a tMap (on the id column, returning all matches), then aggregate it using the id, and output a list of objects for language and colour. Then in tJavaFlex, loop over the rows to construct the final json (here's the java code).

    This gives the below formatted output, based on your example :

    {
        items: [{
                "id": 1,
                "category": "A",
                "colours": [{
                        "colour": "Red",
                        "language": "en_GB"
                    }, {
                        "colour": "Rot",
                        "language": "de_DE"
                    }
                ]
            }, {
                "id": 2,
                "category": "A",
                "colours": [{
                        "colour": "Blue",
                        "language": "en_GB"
                    }
                ]
            }, {
                "id": 3,
                "category": "B",
                "colours": [{
                        "colour": "Green",
                        "language": "en_GB"
                    }, {
                        "colour": "Grün",
                        "language": "de_DE"
                    }, {
                        "colour": "Verde",
                        "language": "es_ES"
                    }
                ]
            }
        ]
    }