Search code examples
redisredisjsonredis-search

Finding total sum by aggregating Redis JSON


I want to find total website visits by summing up the visits of individual pages.

In my Redis, I have JSONs like:

{'page_name': 'home', 'visit_count' : 10}
{'page_name': 'add_to_cart', 'visit_count' : 7}
{'page_name': 'checkout', 'visit_count' : 5}

I want to run an aggregation logic using FT.AGGREATE so that I can get, total_visits = 10 + 7 + 5 = 22

I can add necessary indexing, if required.


Solution

  • You can use FT.AGGREGATE for that. You will have to index at least one field for that.

    Here is an example using your data.

    Add Data

    127.0.0.1:6379> JSON.SET doc:1 $ '{"page_name": "home", "visit_count" : 10}' 
    OK
    127.0.0.1:6379> JSON.SET doc:2 $ '{"page_name": "add_to_cart", "visit_count" : 7}' 
    OK
    127.0.0.1:6379> JSON.SET doc:3 $ '{"page_name": "checkout", "visit_count" : 5}' 
    OK
    

    Create an index

    At least one field needs to be indexed. Here are few possible options:

    1. Indexing visit_count:

      FT.CREATE idx ON JSON PREFIX 1 doc: SCHEMA $.visit_count AS visit_count SORTABLE 
      
    2. Indexing page_name

      FT.CREATE idx ON JSON PREFIX 1 doc: SCHEMA $.page_name AS page_name TAG 
      
    3. Indexing page_name and referring to visit_count(multiple lines for readability):

      FT.CREATE idx ON JSON PREFIX 1 doc: SCHEMA 
          $.visit_count AS visit_count SORTABLE NOINDEX 
          $.page_name AS page_name TAG 
      

      Notice that I mention visit_count with NOINDEX for quick access but without indexing (so it cannot be referred to in the query part, but can be referred to as part of the aggregations).

    Querying the index

    The trick is to use some dummy field that is equal in all the documents. You can use an existing one if you have one, but if not you can create one using APPLY (documentation here):

    APPLY 1 AS __dummy 
    

    will add to all the documents a field named __dummy with the value 1. This is only for the query context and it will not be added to the JSON object itself. Of course, any name and value are possible.

    For cases 1 and 3:

    FT.AGGREGATE idx * APPLY 1 AS __dummy GROUPBY 1 @__dummy REDUCE SUM 1 @visit_count AS count 
    

    For case 2:

    FT.AGGREGATE idx * APPLY 1 AS __dummy LOAD 3 $.visit_count AS visit_count GROUPBY 1 @__dummy REDUCE SUM 1 @visit_count AS count 
    

    Output (for all cases):

    1) (integer) 1
    2) 1) "__dummy"
       2) "1"
       3) "count"
       4) "22"
    

    EDIT

    Apparently, you can GROUPBY 0 field, so the APPLY trick is redundant.

    Use

    FT.AGGREGATE idx * GROUPBY 0 REDUCE SUM 1 @visit_count AS count 
    

    for cases 1 and 3, or, for case 2:

    FT.AGGREGATE idx * LOAD 3 $.visit_count AS visit_count GROUPBY 0 REDUCE SUM 1 @visit_count AS count 
    

    to get the output:

    1) (integer) 1
    2) 1) "count"
       2) "22"
    

    Using redis-py:

    import redis
    import redis.commands.search
    import redis.commands.search.reducers as reducers
    from redis.commands.search.aggregation import AggregateRequest
    from redis.commands.json.path import Path
    from redis.commands.search.field import (
        NumericField,
        TagField,
    )
    from redis.commands.search.indexDefinition import IndexDefinition, IndexType
    
    
    conn = redis.Redis(host="localhost", port=6379, db=0)
    
    definition = IndexDefinition(
        prefix=["doc:"],
        index_type=IndexType.JSON,
    )
    
    conn.ft().create_index((
            TagField("$.page_name", as_name="page_name"),
            NumericField("$.visit_count", as_name="visit_count", sortable=True, no_index=True),
        ),
        definition=definition,
    )
    
    conn.json().set("doc:1", Path.root_path(), {"page_name": "home", "visit_count": 10})
    conn.json().set("doc:2", Path.root_path(), {"page_name": "add_to_cart", "visit_count": 7})
    conn.json().set("doc:3", Path.root_path(), {"page_name": "checkout", "visit_count": 5})
    
    total_sum = conn.ft().aggregate(AggregateRequest().group_by([], reducers.sum("visit_count"))).rows[0][1]
    print(int(total_sum)) # 22
    
    

    ALTERNATIVE APPROACH

    If you don't need to index the documents for other reasons, you can use a LUA script to scan all the keys and sum up this value. Scanning the entire key space might be slow if you have a lot of keys, but it will not require additional memory overhead and indexing time when modifying or adding new keys