Search code examples
jsonetlpentahokettle

RDB to JSON in Kettle?


I'm currently trying to get a handle on how Kettle 4.4 handles data transformations by trying to port something I'm currently doing via Python to a Kettle job.

I have a relational database with four tables that I need to import into my data pipeline. Here's a simplified version of the model...

Widgets

+-----------+-------------+----------------+
| WIDGET_ID |    Name     |     Notes      |
+-----------+-------------+----------------+
|         1 | Gizmo       | Red paint job  |
|         2 | Large Gizmo | Blue paint job |
+-----------+-------------+----------------+

Customers

+-----------+------------+----------------------------------+
| WIDGET_ID |    Name    |         Mailing_Address          |
+-----------+------------+----------------------------------+
|         1 | Acme, Inc. | 123 Fake Street, Springfield, IL |
|         2 | Fake Corp. | 555 Main Street, Small Town, IN  |
|         2 | Acme, Inc. | 123 Fake Street, Springfield, IL |
+-----------+------------+----------------------------------+

Inventory

+-----------+--------+------------+
| WIDGET_ID | Amount |    Date    |
+-----------+--------+------------+
|         2 |  11000 | 2012-01-15 |
|         1 |  13000 | 2012-02-05 |
|         1 |    900 | 2013-01-01 |
+-----------+--------+------------+

I'd like to be able to take the above and produce JSON output like this:

{
    "id": 1,
    "Name": "Gizmo",
    "Notes": "Red Paint Job",
    "Customers": [
        {
            "Name": "Acme, Inc.",
            "Address": "123 Fake Street..."
        }
    ],
    "Inventory": [
        {
            "Amount": 13000,
            "Date": "2012-02-05"
        },
        {
            "Amount": 900,
            "Date": "2013-01-01"
        }
    ]
}

My attempts to use Kettle's joins, JS transforms and JSON output have not been very successful, and I find the documentation to be quite lacking. Can anyone help me out, or point me in the right direction?

Thanks!


Solution

  • you can use 3 (well 6 in total) kettle steps for this transformation:

    1) add 3 table input steps one for each table.

    2) add next a Multiway Merge Join step, group the 3 table input step arrow flows onto this, choose widget_id as key field, choose inner join type.

    3) add 1 json ouput step to the output flow the multiway join step.

    to make the final json format you have to use the JSONPath notation:

    http://goessner.net/articles/JsonPath/

    hope it helps.

    (if you are new in kettle i recommend to go trough the samples folder included in kettle spoon)