Search code examples
phpelasticsearchfoselasticabundle

Elasticsearch - Distinct Values, Not Counts


I am trying to do something similar to this SQL query:

SELECT * FROM table WHERE fileContent LIKE '%keyword%' AND company_id = '1' GROUP BY email

Having read posts similar to this I have this:

{
    "query": {
        "bool": {
            "must": [{
                "match": {
                    "fileContent": {
                        "query": "keyword"
                    }
                }
            }],
            "filter": [{
                "terms": {
                    "company_id": [1]
                }
            }]
        }
    },
    "aggs": {
        "group_by_email": {
            "terms": {
                "field": "email",
                "size": 1000
            }
        }
    },
    "size": 0
}

Field mappings are:

{
  "cvs" : {
    "mappings" : {
      "application" : {
        "_meta" : {
          "model" : "Acme\\AppBundle\\Entity\\Application"
        },
        "dynamic_date_formats" : [ ],
        "properties" : {
          "email" : {
            "type" : "keyword"
          },
          "fileContent" : {
            "type" : "text"
          },
          "company_id" : {
            "type" : "text"
          }
        }
      }
    }
  }
}

... which are generated from Symfony config.yml:

fos_elastica:
    clients:
        default:
            host: "%elastica.host%"
            port: "%elastica.port%"
    indexes:
        cvs:
            client: default
            types:
               application:
                    properties:
                        fileContent: ~
                        email:
                          index: not_analyzed
                        company_id: ~
                    persistence:
                        driver: orm
                        model: Acme\AppBundle\Entity\Application
                        provider: ~
                        finder: ~

The filter works fine, but I am finding that hits:hits returns no items (or all results matching the search if I remove size:0) and aggregations:group_by_email:buckets has a count of the groups but not the records themselves. The records that were grouped aren't returned and it's these that I need.

I have also tried with FOSElasticBundle using the query builder if this is your preferred flavour (this works but doesn't have the grouping/aggregation):

$boolQuery = new \Elastica\Query\BoolQuery();

$filterKeywords = new \Elastica\Query\Match();
$filterKeywords->setFieldQuery('fileContent', 'keyword');
$boolQuery->addMust($filterKeywords);

$filterUser = new \Elastica\Query\Terms();
$filterUser->setTerms('company_id', array('1'));
$boolQuery->addFilter($filterUser);

$finder = $this->get('fos_elastica.finder.cvs.application');

Thanks.


Solution

  • top_hits:{size:1} appears to be what I need, having played around with Andrei's answer. This will return one record for each bucket in the aggregation

      "aggs": {
        "group_by_email": {
          "terms": {
            "field": "email",
            "size": 1000
          },
          "aggs": {
            "sample_docs": {
              "top_hits": {
                "size": 1
              }
            }
          }
        }
      }
    

    Ref: top_hits