Search code examples
amazon-redshiftamazon-redshift-spectrum

Redshift JSONPaths file for dynamic json file


Given the below json object

{
  "player": {
    "francesco totti": {
      "position": "forward"
    },
    "andrea pirlo": {
      "position": "midfielder"
    }
  }
}

I would like to import the above file into Redshift as the below rows

name, position
"franceso totti", "forward"
"andrea pirlo", "midfielder"

The thing is the 'player' object has a dynamic number of objects each hour(cadence of when I import into Redshift). For example, the next hour run may look like the following.

{
  "player": {
    "fabio cannavaro": {
      "position": "defender"
    }
  }
}

Is it possible to use a JSONPaths file to import this file every hour or does it require preprocessing?


Solution

  • You can reuse the jsonpath file as much as you like. You will just need to rerun the COPY statement but remember this will add rows to the table - not replace them. If you are replacing then you will want to clear the table out first (delete, drop/recreate, truncate - each with its own performance and limitations).

    Now your json format isn't going to work for Redshift AFAIK. You have the player name as the field identifier and want to set this as the value of a column. You will want something like this (sorry these aren't tested):

    {
      "player": {
        "name": "francesco totti",
        "position": "forward"
        }
    },
    {
      "player": {
        "name": "andrea pirlo",
        "position": "midfielder"
        }
    }
    

    And a jsonpath like this:

    {
        "jsonpaths": [
           "$.player.name",
           "$.player.position"
        ]
    }