I'm trying to get the count of all rows and count of distinct values in a column using the elasticsearch-dsl package in python.
I'm fairly new to elasticsearch, so apologies if I'm asking a dumb question but I've read all the available documentation on this and couldn't figure this out.
Any help on this would be appreciated!
For getting the count of all rows, I'm using the '.aggs.metric(), which is working fine, and for getting count of distinct values I've tried .bucket('terms') and .bucket('cardinality') which is not returning what I want.
For total count of rows:
s = Search(using=client, index="<index_name>")
s.aggs.metric('total', 'sum', field = '<column>')
s = s.execute()
s.aggregations.total.value
For count of distinct values in a column:
s = Search(using=client, index="brandcleanerv2")
s.aggs.metric('by_cluster', 'cardinality', field='cluster')
s = s.execute()
The second code snippet is returning 10 rows, I've also tried using 'terms' parameter inside .bucket(), but it returned the count of number occurrences of each distinct value in the column and that too for only 10 values.
You have to access the s.aggregations.by_cluster.value
(after running the execute()
) to get the result of the cardinality
aggregation which does what you want.