Search code examples
ibm-cloudcloudantdashdb

Cloudant JSON data into dashdb table joins


I have successfully imported some JSON data into cloudant, the JSON data has three levels. Then created the dashdb warehouse from cloudant to put the data into relational tables. It appears that dashdb has created three tables for each of the levels in the JSON data but has not provided me with a Key to join back to the top level. Is there a customisation that is done somewhere that tells dashdb how to join the tables. A sample JSON doc is below:

 {
  "_id": "579b56388aa56fd03a4fd0a9",
  "_rev": "1-698183d4326352785f213b823749b9f8",
  "v": 0,
  "startTime": "2016-07-29T12:48:04.204Z",
  "endTime": "2016-07-29T13:11:48.962Z",
  "userId": "Ranger1",
  "uuid": "497568578283117a",
  "modes": [
    {
      "startTime": "2016-07-29T12:54:22.565Z",
      "endTime": "2016-07-29T12:54:49.894Z",
      "name": "bicycle",
      "_id": "579b56388aa56fd03a4fd0b1",
      "locations": []
    },
    {
      "startTime": "2016-07-29T12:48:02.477Z",
      "endTime": "2016-07-29T12:53:28.503Z",
      "name": "walk",
      "_id": "579b56388aa56fd03a4fd0ad",
      "locations": [
        {
          "at": "2016-07-29T12:49:05.716Z",
          "_id": "579b56388aa56fd03a4fd0b0",
          "location": {
            "coords": {
              "latitude": -34.0418308,
              "longitude": 18.3503616,
              "accuracy": 37.5,
              "speed": 0,
              "heading": 0,
              "altitude": 0
            },
            "battery": {
              "is_charging": true,
              "level": 0.7799999713897705
            }
          }
        },
        {
          "at": "2016-07-29T12:49:48.488Z",
          "_id": "579b56388aa56fd03a4fd0af",
          "location": {
            "coords": {
              "latitude": -34.0418718,
              "longitude": 18.3503895,
              "accuracy": 33,
              "speed": 0,
              "heading": 0,
              "altitude": 0
            },
            "battery": {
              "is_charging": true,
              "level": 0.7799999713897705
            }
          }
        },
        {
          "at": "2016-07-29T12:50:20.760Z",
          "_id": "579b56388aa56fd03a4fd0ae",
          "location": {
            "coords": {
              "latitude": -34.0418788,
              "longitude": 18.3503887,
              "accuracy": 33,
              "speed": 0,
              "heading": 0,
              "altitude": 0
            },
            "battery": {
              "is_charging": true,
              "level": 0.7799999713897705
            }
          }
        }
      ]
    },
    {
      "startTime": "2016-07-29T12:53:37.137Z",
      "endTime": "2016-07-29T12:54:18.505Z",
      "name": "carshare",
      "_id": "579b56388aa56fd03a4fd0ac",
      "locations": []
    },
    {
      "startTime": "2016-07-29T12:54:54.112Z",
      "endTime": "2016-07-29T13:11:47.818Z",
      "name": "bus",
      "_id": "579b56388aa56fd03a4fd0aa",
      "locations": [
        {
          "at": "2016-07-29T13:00:08.039Z",
          "_id": "579b56388aa56fd03a4fd0ab",
          "location": {
            "coords": {
              "latitude": -34.0418319,
              "longitude": 18.3503623,
              "accuracy": 36,
              "speed": 0,
              "heading": 0,
              "altitude": 0
            },
            "battery": {
              "is_charging": false,
              "level": 0.800000011920929
            }
          }
        }
      ]
    }
  ]
}

SQL for the three tables created in dashdb showing all the fields in each table is here. Note there is no FK that I can see, the "_ID" fields are unique to each table.

SELECT ENDTIME,STARTTIME,USERID,UUID,V,"_ID","_REV" 
FROM <schemaname>.RANGER_DATA
where "_ID" = '579b56388aa56fd03a4fd0a9'

SELECT ARRAY_INDEX,ENDTIME,NAME,STARTTIME,TOTALPAUSEDMS,"_ID"
FROM <schemaname>.RANGER_DATA_MODES
where "_ID" = '579b56388aa56fd03a4fd0b1'

SELECT ARRAY_INDEX,AT,LOCATION_BATTERY_IS_CHARGING,LOCATION_BATTERY_LEVEL,LOCATION_COORDS_ACCURACY,LOCATION_COORDS_ALTITUDE,LOCATION_COORDS_HEADING,LOCATION_COORDS_LATITUDE,LOCATION_COORDS_LONGITUDE,LOCATION_COORDS_SPEED,RANGER_DATA_MODES,"_ID" 
FROM <schemaname>.RANGER_DATA_MODES_LOCATIONS
where "_ID" = '579b56388aa56fd03a4fd0b0'

Solution

  • Cloudant uses _id for its UID for each document. It seems that the warehousing task iterates over these documents and assumes that there is a new document every time it sees a new _id.

    Because you're using _id in your modes and locations this will produce an undesired result in the SQL DB.

    Renaming your _id in modes and locations to something else should fix the problem.