I'm very new to ElasticSearch, and I'm trying to make an aggregation, but can't seem to get it right.
I have some data in an ElasticSearch index that looks like this:
{
"customerId": "example_customer",
"request": {
"referer": "https://example.org",
}
"@timestamp": "2020-09-29T14:14:00.000Z"
}
My mapping:
{
"mappings": {
"properties": {
"customerId": { "type": "keyword" },
"request": {
"properties": {
"referer": { "type": "keyword" }
}
}
}
}
}
And I'm trying to get the referers that appear the most frequently for a specific customer in a date range. I could make the filter for the customer like this:
var result = await _client.SearchAsync<InsightRecord>(s =>
s.Aggregations(
a => a
.Filter("customer", customer =>
customer.Filter(q => q.Term(ir => ir.CustomerId, customerId)))
.Terms("top_referer", ts => ts.Field("request.referer"))
)
);
return result.Aggregations.Terms("top_referer").Buckets
.Select(bucket => new TopReferer { Url = bucket.Key, Count = bucket.DocCount ?? 0})
Now I want to narrow this down to a specific time range. This is what I have so far:
var searchDescriptor = s.Aggregations(a =>
a.Filter("customer", customer =>
customer.Filter(q =>
q.Bool(b =>
b.Must(
f2 => f2.DateRange(date => date.GreaterThanOrEquals(from).LessThanOrEquals(to)),
f1 => f1.Term(ir => ir.CustomerId, customerId)
)
)
)
)
.Terms("top_referers", ts => ts.Field("request.referer"))
);
The problem is that the date filter doesn't get included in the query, it translates to this JSON:
{
"aggs": {
"customer": {
"filter": {
"bool": {
"filter": [{
"term": {
"customerId": {
"value": "example_customer"
}
}
}
]
}
}
},
"top_referers": {
"terms": {
"field": "request.referer"
}
}
}
}
I tried ordering them differently, but it didn't help. It's always the customer filter that will appear in the JSON, and the date range is skipped. I also saw some people using a query combined with an aggregation, but I feel like I should be able to do this using the aggregation alone. Is this possible? What am I doing wrong in my query that the range doesn't show up in the JSON?
The issue is that the date range
query does not specify a field
f2 => f2.DateRange(date => date.GreaterThanOrEquals(from).LessThanOrEquals(to)),
Add a field to this query
f2 => f2.DateRange(date => date
.Field("@timestamp")
.GreaterThanOrEquals(from)
.LessThanOrEquals(to)
),
In addition, for the filter aggregation to apply to the terms aggregation, the terms aggregation needs to be a sub aggregation of the filter aggregation. So it would be something like
var customerId = "foo";
var from = "now-365d";
var to = "now";
var result = await _client.SearchAsync<InsightRecord>(s => s
.Aggregations(a => a
.Filter("customer", customer => customer
.Filter(q => q
.Bool(b => b
.Must(
f2 => f2.DateRange(date => date
.Field("@timestamp")
.GreaterThanOrEquals(from)
.LessThanOrEquals(to)
),
f1 => f1.Term(ir => ir.CustomerId, customerId)
)
)
)
.Aggregations(aa => aa
.Terms("top_referers", ts => ts.Field("request.referer"))
)
)
)
);
Rather than specifying the date range and term queries using a filter aggregation though, I'd be inclined to specify them as the query to the search request. The query is taken into account when calculating aggregations. A filter aggregation is useful when you want to query on a dataset but run an aggregation only on a subset of the dataset e.g. if you were searching across all customers but then wanted to run an aggregation only on a subset of the customers. In practice, for this particular example, the outcome should be the same whether the query is specified as the query part of a search request, or as a filter aggregation with the terms aggregation as a sub aggregation, but the former is perhaps a little easier to get the results from.
Specified as the query would look something like
var result = await _client.Search<InsightRecord>(s => s
.Query(q => q
.Bool(b => b
.Must(
f2 => f2.DateRange(date => date
.Field("@timestamp")
.GreaterThanOrEquals(from)
.LessThanOrEquals(to)
),
f1 => f1.Term(ir => ir.CustomerId, customerId)
)
)
)
.Aggregations(aa => aa
.Terms("top_referers", ts => ts.Field("request.referer"))
)
);
Further, there's a couple more things we can do
.Size(0)
to not bother returning search hits on the response.bool
query can be more succinctly expressed by combining queries with operator overloading, and since both queries are predicates (a document either matches the query or it doesn't), we can specify both in a filter clause to omit scoring. The final query then is something likevar result = await _client.SearchAsync<InsightRecord>(s => s
.Size(0)
.Query(q => +q
.DateRange(date => date
.Field("@timestamp")
.GreaterThanOrEquals(from)
.LessThanOrEquals(to)
) && +q
.Term(ir => ir.CustomerId, customerId)
)
.Aggregations(aa => aa
.Terms("top_referers", ts => ts.Field("request.referer"))
)
);
which generates the query
{
"aggs": {
"top_referers": {
"terms": {
"field": "request.referer"
}
}
},
"query": {
"bool": {
"filter": [{
"range": {
"@timestamp": {
"gte": "now-365d",
"lte": "now"
}
}
}, {
"term": {
"customerId": {
"value": "foo"
}
}
}]
}
},
"size": 0
}
The terms aggregation buckets can be accessed as expressed in your question.