Search code examples
databaseredistype-conversionaggregatefull-text-search

How to parse ISO 8601 times on Redis when aggreating data using FT.AGGREGATE?


I have a database with keys storing times in UTC in ISO 8601 format, like so:

(data:key1)
{ "time": "2023-12-10T10:42:50.277Z" }
(data:key2)
{ "time": "2023-12-15T19:02:40.308Z" }
...

I want to aggregate data in these keys in some way, let's say I want to count the number of keys whose times are between 2023-12-10 and 2023-12-29.

Redis includes the FT.AGGREGATE command in its RediSearch module, with an APPLY subcommand which can be used to transform data using different types of expressions. To convert string-based times to/from numeric values, two transformations listed in the docs seem good:

timefmt(x, [fmt]) Return a formatted time string based on a numeric timestamp value x. See strftime for formatting options. Not specifying fmt is equivalent to %FT%TZ. parsetime(timesharing, [fmt]) The opposite of timefmt() - parse a time format using a given format string

However, I cannot find a way to make that work. I first create a search index with FT.CREATE mySearch ON JSON PREFIX 1 data: SCHEMA $.time AS time TEXT, then to aggregate I call FT.AGGREGATE mySearch "*" APPLY "parsetime(@time, '%G-%m-%dT%H:%M:%S.%fZ')" as numericTime, to which the response is:

1) "1"
2) 1) "time"
   2) "2023-12-14T20:42:50.277Z"
   3) "numericTime"
   4) "null" // <- this should be a unix timestamp instead of null

What am I doing wrong, and what's the correct way to parse these time strings into numeric values?


Solution

  • I spent 20 minutes poking around to see how Redis responds to various strings containing dates and times. Here's what I've learned:

    • Search will return null if you give it formatting options that are invalid. I tried it with %q, which is very invalid, to confirm this.
    • Redis' Search documentation points to the Python docs for strftime. This is incorrect and needs to be rectified (I will point this out to our docs team). Search is not implemented in Python—rather in C. So, it really uses the C formatting options, which are slightly different.
    • The C formatting specifications do not support %f for microseconds while the Python ones do. In fact, it doesn't support fractions of seconds at all. So, this isn't going to work.
    • Furthermore, you have milliseconds in your string and not microseconds. So, even if Search were using the Python specifications, it would still be incorrect. Python might be smart enough to figure this out but I didn't test that since it was moot.

    In my estimation, you have two choices:

    • If milliseconds matter, you can convert this to an UNIX Epoch time before saving it to Redis. Search will gladly deal with numbers that have decimals in them. Done it personally. Totally works. Just make sure you save it as a number in the JSON and not a string that contains a number.
    • If milliseconds do not matter, remove the milliseconds from the ISO 8601 date before you save it to Redis.

    A couple of other things of note:

    • There is a bug in your string. You use %G for the year. It should be %Y. %G is the week-based year, not the actual year. ISO 8601 dates use the actual year.
    • There are some shortcuts you can use for parsing this format. %F is short for %Y-%m-%d and %T is short for %H:%M:%S.
    • You can use %Z to match the timezone. It's a little more robust than just using a hard-coded Z as you might encounter times that specify the timezone.

    All this said, I could revise your parsing string to be %FT%T%Z but I think that %Y-%m-%dT%H:%M:%S%Z is probably easier to read. Up to personal preference!

    Anyhow, that was kind of a long answer but I certainly hope it helps!