I'm in the midst of exploring RediSearch and I thought I'd give the aggregations feature a shot and have hit a roadblock.
I can't seem to get a good result.
For testing purposes I created a basic index/schema like so:
FT.CREATE test SCHEMA field TEXT
FT.ADD test 1A 1 FIELDS field hello
FT.ADD test 2A 1 FIELDS field hello
FT.ADD test 3A 1 FIELDS field hello
FT.ADD test 4A 1 FIELDS field world
Next, I issued the following query:
FT.AGGREGATE test "*" GROUPBY 1 @field REDUCE COUNT 0 AS agg
My expectation was that I get a result indicating hello
occurs three times and world
occurs once... but instead I get the following result:
1) (integer) 1
2) 1) "field"
2) (nil)
3) "agg"
4) "4"
I thought it was pretty straight forward... but I'm obviously doing something wrong.
Also, the following is the output from the MODULE LIST
command:
1) 1) "name"
2) "ft"
3) "ver"
4) (integer) 10300
2) 1) "name"
2) "ReJSON"
3) "ver"
4) (integer) 10001
Any help would be super.
Thanks!
It turns out that I should have read the documentation better.
From the section in the aggregations documentation where they describe the FT.AGGREGATE
command parameters they mention LOAD {nargs} {property}
, where they say:
Load document fields from the document HASH objects. This should be avoided as a general rule of thumb. Fields needed for aggregations should be stored as SORTABLE, where they are available to the aggregation pipeline with very low latency. LOAD hurts the performance of aggregate queries considerably since every processed record needs to execute the equivalent of HMGET against a redis key, which when executed over millions of keys, amounts to very high processing times.
From the query example in the original question I had:
FT.AGGREGATE test "*" GROUPBY 1 @field REDUCE COUNT 0 AS agg
Since the schema definition didn't have field
defined as SORTABLE
I would have to LOAD
"field" in order to perform an aggregation on it.
FT.AGGREGATE test "*" LOAD 1 @field GROUPBY 1 @field REDUCE COUNT 0 AS agg
However, since according to the documentation LOAD
hurts performance I should have instead defined the field I want to aggregate as SORTABLE
.
FT.CREATE test SCHEMA field TEXT SORTABLE
With the schema properly defined my original query works.