Search code examples
jsongroovyapache-nifi

Complex JSON transformation in Groovy


This question kinda based on my old question.

Source JSON:

{
  "reports": [
    {
      "columnHeader": {
        "dimensions": [
          "ga:date"
        ],
        "metricHeader": {
          "metricHeaderEntries": [
            {
              "name": "ga:sessions",
              "type": "INTEGER"
            },
            {
              "name": "ga:bounces",
              "type": "INTEGER"
            }
          ]
        }
      },
      "data": {
        "rows": [
          {
            "dimensions": [
              "20210110"
            ],
            "metrics": [
              {
                "values": [
                  "49",
                  "22"
                ]
              }
            ]
          },
          {
            "dimensions": [
              "20210111"
            ],
            "metrics": [
              {
                "values": [
                  "40",
                  "17"
                ]
              }
            ]
          }
        ],
        "totals": [
          {
            "values": [
              "149",
              "64"
            ]
          }
        ],
        "rowCount": 2,
        "minimums": [
          {
            "values": [
              "40",
              "17"
            ]
          }
        ],
        "maximums": [
          {
            "values": [
              "60",
              "25"
            ]
          }
        ],
        "isDataGolden": true
      }
    }
  ]
}

I need to create JSON with metrics array from JSON above. I should loop through data.rows.dimensions and map data.rows.metrics.values to columnHeader.metricHeader.metricHeaderEntries.name (order is saved correctly). Or loop through metricHeaderEntries.name and add to them values and report_date. I don't know which way is right. So, I need an array metrics with fields name, value, report_date as I described above. Also, report_date from source JSON is in invalid format. Value like 20210110 should be converted to 2021-01-10

Expected output (or something like this, but grouped by report_date):

{
  "metrics": [
            {
              "name": "ga:sessions",
              "value": "49",
              "report_date": "2021-01-10"
            },
            {
              "name": "ga:sessions",
              "type": "40",
              "report_date": "2021-01-11"
            },
            {
              "name": "ga:bounces",
              "type": "22",
              "report_date": "2021-01-10"
            },
            {
              "name": "ga:bounces",
              "type": "17",
              "report_date": "2021-01-11"
            }
          ],
    "isDataGolden": true      
}

UPDATE

I tried with this script:


def data = new JsonSlurper().parseText(content)

def report = data.reports[0]
def rows = report.data.rows


def result = [ 
    metrics : rows.indexed().collect{index, value ->
            [ 
                name : report.columnHeader.metricHeader.metricHeaderEntries.name[index],
                value: value.metrics[0].values[index],
                report_date: value.dimensions[0]
                
            ]
        }, 
    isDataGolden : report.isDataGolden 
]

It returns:

{
   "metrics":[
      {
         "name":"ga:sessions",
         "value":"49",
         "report_date":"20210110"
      },
      {
         "name":"ga:bounces",
         "value":"25",
         "report_date":"20210111"
      }
   ],
   "isDataGolden": true
}

So it doesn't loop through all date+metrics combinations. I guess i need a nested loop? But what kind of?


Solution

  • collect is always a 1:1 mapping between input and output. So you can not use to to create multiple outputs per input; you have to use collectMany, which does that.

    So use collectMany to iterate all rows and then collect over the header/values tuples.

    import groovy.json.JsonSlurper
    def data = new JsonSlurper().parse("data.json" as File)
    def report = data.reports[0]
    
    def headers = report.columnHeader.metricHeader.metricHeaderEntries
    println report.data.rows.collectMany{
        [headers, it.metrics[0].values].transpose().collect{ h, v ->
            [name: h.name, value: v, report_date: it.dimensions[0]]
        }
    }
    // [[name:ga:sessions, 
    //   value:49, 
    //   report_date:20210110], 
    //  [name:ga:bounces, 
    //   value:22, 
    //   report_date:20210110], 
    //  [name:ga:sessions, 
    //   value:40, 
    //   report_date:20210111], 
    //  [name:ga:bounces, 
    //   value:17, 
    //   report_date:20210111]]