Search code examples
indexingcouchdbibm-cloudcloudantnosql

How to omit fields in nested objects in Cloudant Query result?


I've googled and tried the hell out of this, and it seems like it's just not possible. Maybe (hopefully) someone knows better :D

So I have a Cloudant database running on Bluemix, which I'm both very new to. Queries, indexes, views... struggling a bit here, but so far I can successfully retrieve documents – in my case filtered by timestamp. Now I just want to make the output a bit handier.

In my database I have documents structured like so:

{
  "_id": "0048160a463a73faaa6c90f5af027772",
  "_rev": "1-ff6255309f1b873a4e482310843a8a15",
  "timestamp": 1496275536932.6602,
  "results": {
    "lines": {
      "S1": [
        {
          "needed_key": "foo",
          "not_needed_key": 1
        }
      ],
      "S2": [
        {
          "needed_key": "bar",
          "not_needed_key": 1
        },
        {
          "needed_key": "foo_bar",
          "not_needed_key": 1
        }
      ],
      ...
    }
  },
  "station": "5002270",
  "another_not_needed_key": "something"
}

Shortened, my Cloudant selector looks somewhat like this:

{
  "selector": {
    "$and": [{
      "timestamp": {
        "$gte": from,
        "$lt": to
      },       
      "results.lines": {
        "$ne": {}
      }
    ]},
  "fields": [
    "_id",
    "timestamp",
    "station",
    "results"
  ],
  ...
}

See how "another_not_needed_key" is not in fields, because, well, I don't need that info. Now I want to do the same thing for the unneeded fields within the objects of the lines array.

I've read somewhere that, for arrays, something like

"results.lines.S1.[].needed_key"

as a selector is possible, although I'm not even sure if I got any results testing this. Anyway:

Questions:

  1. Will/should the above work for fields, too? I.e. should it only output the "needed_key" of any in the "S1" array nested objects? Haven't had success with it yet.
  2. Can I generalize the "S1" somehow? Like with the [] for all objects of the array, I want to address all keys within lines. Because: Some may contain "S1" as a key, others not. Overall, there are seven possible keys here, variably combined.

If anything is unclear, I'm happy to provide more info. Thanks in advance!


Solution

    1. No :( Using the [] shorthand with a text-type index will output the entire contents of the array when you query.

    2. Not in Cloudant Query, no :( The best you could do is maybe change your data structure to where you have nested arrays, but this wouldn't help with filtering the projected values.

    3. Your query is sufficiently complex that it might warrant creating a programmatic search index (i.e., in a design document, not via Cloudant Query): https://console.ng.bluemix.net/docs/services/Cloudant/api/search.html#search (if this sounds confusing, there's more context here below).

    At a high level, it depends on how you've indexed your data via Cloudant Query. There are two ways: "type": "json" and "type": "text". text-type can use the [] notation, while json-type can't. I'm going to explain the current json-type approach, and will update with the text-type info. This is also a good thread: Cloudant/Mango selector for deeply nested JSONs

    To back up a bit: In Cloudant, you can't really do any ad hoc querying, which you pretty much know since you've gone this far. In Cloudant/CouchDB, an index must exist before you can query anything. (You get the primary index on "_id" by default.) Plus, there are also multiple indexing engines: 1. traditional Map-based views for creating secondary indexes, 2. Lucene search indexes, and 3. geospatial indexes.

    Cloudant Query abstracts some of this index-then-query system, but you still need to tell it what & how to index. It can create two types of indexes: "json" (corresponding to #1 above) and "text" (#2 above).

    Based on the behavior you're observing, it looks like you've created a json-type index in Cloudant query, and are issuing selector statements against that. Here's the closest you can get with that approach:

    json-type index:

    {
      "index": {
        "fields": [
          "_id",
          "timestamp",
          "station",
          "results"
        ]
      },
      "type": "json"
    }
    

    json-type selector

    {
      "selector": {
        "_id": {
          "$gt": 0
        }
      },
      "fields": [
        "_id",
        "timestamp",
        "station",
        "results.lines.S1.0.needed_key",
        "results.lines.S2.0.needed_key",
        "results.lines.S2.1.needed_key"
      ],
      "sort": [
        {
          "_id": "asc"
        }
      ]
    }
    

    output:

    {
     "_id": "fd298368a7a344b217698677f3f5a07d",
     "timestamp": 1496275536932.6602,
     "station": "5002270",
     "results": {
      "lines": {
       "S1": {
        "0": {
         "needed_key": "foo"
        }
       },
       "S2": {
        "0": {
         "needed_key": "bar"
        },
        "1": {
         "needed_key": "foo_bar"
        }
       }
      }
     }
    }
    

    Unfortunately, the json-type approach requires you to know how things are nested.

    The text-type CQ approach isn't going to work for your current needs, but here's the closest I believe you can get:

    text-type index

    {
      "index": {
        "fields": [
          {"name": "_id", "type": "string"},
          {"name": "timestamp", "type": "number"},
          {"name": "station", "type": "string"},
          {"name":"results.lines.S1.[].needed_key", "type": "string"}
        ]
      },
      "type": "text"
    }
    

    text-type selector (updated to show a more interesting query)

    {
      "selector": {
        "results.lines.S1": {
          "$elemMatch": {"needed_key": "foo"}
        }
      },
      "fields": [
        "_id",
        "timestamp",
        "station",
        "results.lines.S1"
      ]
    }
    

    output

    {
     "_id": "fd298368a7a344b217698677f3f5a07d",
     "timestamp": 1496275536932.6602,
     "station": "5002270",
     "results": {
      "lines": {
       "S1": [
        {
         "needed_key": "foo",
         "not_needed_key": 1
        }
       ]
      }
     }
    }
    

    Hope that helps.