Search code examples
elasticsearchelasticsearch-aggregation

How to count elements by cardinality in elasticsearch?


Suppose I have a mapping that stores physical attributes of people, and a field in that mapping that is the user id. For instance:

    "attributes": {
      "hair_color": {
        "type": "string"
      },
      "eyes_color": {
        "type": "string"
      },
      "height": {
        "type": "float"
      },
      "user_id": {
        "type": "integer"
      }
    }

I'm trying to make a query that will return how many people have a given eye color. For example, it would return something like "green": 962.

I think that what I need to do is a terms bucket for the eye_color field and then a cardinality sub aggregation that takes into account user_id, but I haven't been successful so far. This is what I have:

{
"aggs" : {
    "eyes_color_bucket" : {
        "terms" : {
            "field" : "eyes_color"
        }
    },
    "aggs":{
        "count":{
            "cardinality":{
                "field": "eyes_color_bucket"
            }
        }
    }
}

Which of course fails. Any help appreciated.


Solution

  • You're almost there, try it like this:

    {
      "size": 0,
      "aggs": {
        "eyes_color_bucket": {
          "terms": {
            "field": "eyes_color"
          },
          "aggs": {
            "count": {
              "cardinality": {
                "field": "user_id"
              }
            }
          }
        }
      }
    }
    

    UPDATE

    Following up on Richa's comment below, if you make the assumption that one user will only have one eye color (i.e. no lenses or whatever), you can simplify your aggregation query like this:

    {
      "size": 0,
      "aggs": {
        "eyes_color_bucket": {
          "terms": {
            "field": "eyes_color"
          }
        }
      }
    }
    

    The doc_count you get in each bucket should be the number of users having that eye color. Kudos to @Richa for bringing this up.