Search code examples
indexingredisluajedisamazon-elasticache

Efficient Redis SCAN of multiple key patterns


I'm trying to power some multi-selection query & filter operations with SCAN operations on my data and I'm not sure if I'm heading in the right direction.

I am using AWS ElastiCache (Redis 5.0.6).

Key design: <recipe id>:<recipe name>:<recipe type>:<country of origin>

Example:

13434:Guacamole:Dip:Mexico
34244:Gazpacho:Soup:Spain
42344:Paella:Dish:Spain
23444:HotDog:StreetFood:USA
78687:CustardPie:Dessert:Portugal
75453:Churritos:Dessert:Spain

If I want to power queries with complex multi-selection filters (example to return all keys matching five recipe types from two different countries) which the SCAN glob-style match pattern can't handle, what is the common way to go about it for a production scenario?

Assuming the I will calculate all possible patterns by doing a cartesian product of all field alternating patterns and multi-field filters:

[[Guacamole, Gazpacho], [Soup, Dish, Dessert], [Portugal]]
*:Guacamole:Soup:Portugal
*:Guacamole:Dish:Portugal
*:Guacamole:Dessert:Portugal
*:Gazpacho:Soup:Portugal
*:Gazpacho:Dish:Portugal
*:Gazpacho:Dessert:Portugal

What mechanism should I use to implement this sort of pattern matching in Redis?

  1. Do multiple SCAN for each scannable pattern sequentially and merge the results?
  2. LUA script to use improved pattern matching for each pattern while scanning keys and get all matching keys in a single SCAN?
  3. An index built on top of sorted sets supporting fast lookups of keys matching single fields and solve matching alternation in the same field with ZUNIONSTORE and solve intersection of different fields with ZINTERSTORE?

<recipe name>:: => key1, key2, keyN
:<recipe type>: => key1, key2, keyN
::<country of origin> => key1, key2, keyN

  1. An index built on top of sorted sets supporting fast lookups of keys matching all dimensional combinations and therefore avoiding Unions and Intersecions but wasting more storage and extend my index keyspace footprint?

<recipe name>:: => key1, key2, keyN
<recipe name>:<recipe type>: => key1, key2, keyN
<recipe name>::<country of origin> => key1, key2, keyN
:<recipe type>: => key1, key2, keyN
:<recipe type>:<country of origin> => key1, key2, keyN
::<country of origin> => key1, key2, keyN

  1. Leverage RedisSearch? (while impossible for my use case, see Tug Grall answer which appears to be very nice solution.)
  2. Other?

I've implemented 1) and performance is awful.

private static HashSet<String> redisScan(Jedis jedis, String pattern, int scanLimitSize) {

    ScanParams params = new ScanParams().count(scanLimitSize).match(pattern);

    ScanResult<String> scanResult;
    List<String> keys;
    String nextCursor = "0";
    HashSet<String> allMatchedKeys = new HashSet<>();

    do {
        scanResult = jedis.scan(nextCursor, params);
        keys = scanResult.getResult();
        allMatchedKeys.addAll(keys);
        nextCursor = scanResult.getCursor();
    } while (!nextCursor.equals("0"));

    return allMatchedKeys;

}

private static HashSet<String> redisMultiScan(Jedis jedis, ArrayList<String> patternList, int scanLimitSize) {

    HashSet<String> mergedHashSet = new HashSet<>();
    for (String pattern : patternList)
        mergedHashSet.addAll(redisScan(jedis, pattern, scanLimitSize));

    return mergedHashSet;
}

For 2) I've created a Lua Script to help with the server-side SCAN and the performance is not brilliant but is much faster than 1) even taking into consideration that Lua doesn't support alternation matching patterns and I have to loop each key through a pattern list for validation:

local function MatchAny( str, pats )
    for pat in string.gmatch(pats, '([^|]+)') do
        local w = string.match( str, pat )
        if w then return w end
    end
end

-- ARGV[1]: Scan Count
-- ARGV[2]: Scan Match Glob-Pattern
-- ARGV[3]: Patterns

local cur = 0
local rep = {}
local tmp

repeat
  tmp = redis.call("SCAN", cur, "MATCH", ARGV[2], "count", ARGV[1])
  cur = tonumber(tmp[1])
  if tmp[2] then
    for k, v in pairs(tmp[2]) do
      local fi = MatchAny(v, ARGV[3])
      if (fi) then
        rep[#rep+1] = v
      end
    end
  end
until cur == 0
return rep

Called in such a fashion:

private static ArrayList<String> redisLuaMultiScan(Jedis jedis, String luaSha, List<String> KEYS, List<String> ARGV) {
    Object response = jedis.evalsha(luaSha, KEYS, ARGV);
    if(response instanceof List<?>)
        return (ArrayList<String>) response;
    else
        return new ArrayList<>();
}  

For 3) I've implemented and maintained a secondary Index updated for each of the 3 fields using Sorted Sets and implemented querying with alternating matching patterns on single fields and multi-field matching patterns like this:

private static Set<String> redisIndexedMultiPatternQuery(Jedis jedis, ArrayList<ArrayList<String>> patternList) {

    ArrayList<String> unionedSets = new ArrayList<>();
    String keyName;
    Pipeline pipeline = jedis.pipelined();

    for (ArrayList<String> subPatternList : patternList) {
        if (subPatternList.isEmpty()) continue;
        keyName = "un:" + RandomStringUtils.random(KEY_CHAR_COUNT, true, true);
        pipeline.zunionstore(keyName, subPatternList.toArray(new String[0]));
        unionedSets.add(keyName);
    }

    String[] unionArray = unionedSets.toArray(new String[0]);
    keyName = "in:" + RandomStringUtils.random(KEY_CHAR_COUNT, true, true);
    pipeline.zinterstore(keyName, unionArray);
    Response<Set<String>> response = pipeline.zrange(keyName, 0, -1);
    pipeline.del(unionArray);
    pipeline.del(keyName);
    pipeline.sync();

    return response.get();
}

The results of my stress test cases clearly favor 3) in terms of request latency:

enter image description here


Solution

  • I ended up using a simple strategy to update each secondary index for each field when the key is created:

    protected static void setKeyAndUpdateIndexes(Jedis jedis, String key, String value, int idxDimSize) {
        String[] key_arr = key.split(":");
        Pipeline pipeline = jedis.pipelined();
    
        pipeline.set(key, value);
        for (int y = 0; y < key_arr.length; y++)
            pipeline.zadd(
                    "idx:" +
                        StringUtils.repeat(":", y) +
                        key_arr[y] +
                        StringUtils.repeat(":", idxDimSize-y),
                    java.time.Instant.now().getEpochSecond(),
                    key);
    
        pipeline.sync();
    }
    

    The search strategy to find multiple keys that match a pattern including alternating patterns and multi-field filters was achieved like:

    private static Set<String> redisIndexedMultiPatternQuery(Jedis jedis, ArrayList<ArrayList<String>> patternList) {
    
        ArrayList<String> unionedSets = new ArrayList<>();
        String keyName;
        Pipeline pipeline = jedis.pipelined();
    
        for (ArrayList<String> subPatternList : patternList) {
            if (subPatternList.isEmpty()) continue;
            keyName = "un:" + RandomStringUtils.random(KEY_CHAR_COUNT, true, true);
            pipeline.zunionstore(keyName, subPatternList.toArray(new String[0]));
            unionedSets.add(keyName);
        }
    
        String[] unionArray = unionedSets.toArray(new String[0]);
        keyName = "in:" + RandomStringUtils.random(KEY_CHAR_COUNT, true, true);
        pipeline.zinterstore(keyName, unionArray);
        Response<Set<String>> response = pipeline.zrange(keyName, 0, -1);
        pipeline.del(unionArray);
        pipeline.del(keyName);
        pipeline.sync();
    
        return response.get();
    }