Search code examples
databaseindexingredisaggregatefull-text-search

Why does RediSearch FT.AGGREGATE put entries into the wrong groups?


I am running into an issue with the Redis RediSearch module, and I am not sure if it's a bug, or my misunderstanding. I have several JSON documents with question-answer pairs, like so:

// myKey:1
{ 
  "answeredBy": "John",
  "answeredOn": "12 Sep 2023"
  "results": [
    { "question": "Who is a composer?", "answer": "Bach" },
    { "question": "Who is an athlete?", "answer": "Tiger Woods" }
  ]
}
// myKey:2 
{
  "answeredBy": "John", 
  "answeredOn": "24 Oct 2023"
  "results": [
     { "question": "Who is a composer?", "answer": "Bach" }
  ]
}
// etc.

I want to query all questions and answers for a specific person. So I create a search like FT.CREATE mySearch ON JSON PREFIX 1 myKey: SCHEMA $.answeredBy AS answeredBy TAG. I then search for all answers for a person like so: FT.AGGREGATE mySearch @answeredBy:{John} LOAD 6 $.results[*].question AS questions $.results[*].answer AS answers GROUPBY 2 @questions @answers REDUCE COUNT 0 AS numAnswers DIALECT 4.

I expect a result that would make two unique groups, one for ["Who is a composer?", "Who is an atlete?"] and one for ["Who is a composer?"]. However, the actual result considers the questions value of both keys to be the same(!) and groups them together like this:

1) "1"
2) 1) "questions"
   2) "[\"Who is a composer?\", \"Who is an athlete?\"]"
   3) "answers"
   4) "[\"Bach\", "Tiger Woods"]"
   5) "numAnswers"
   6) "2" // <---- THE PROBLEM!! myKey:2 doesn't have the "Who is an athlete?" question, but gets counted as having it. That means the answers count is wrong.

Seems like the GROUPBY step wrongly groups separate values together, if the first value in an array is the same. Maybe that's not the case and I'm using the function wrong. Would appreciate any help.


Solution

  • This is known bug, with known workarounds, and no indication on fix date unfortunately

    https://github.com/RediSearch/RediSearch/issues/3398