I'm trying to power some multi-selection query & filter operations with SCAN
operations on my data and I'm not sure if I'm heading in the right direction.
I am using AWS ElastiCache (Redis 5.0.6).
Key design: <recipe id>:<recipe name>:<recipe type>:<country of origin>
Example:
13434:Guacamole:Dip:Mexico
34244:Gazpacho:Soup:Spain
42344:Paella:Dish:Spain
23444:HotDog:StreetFood:USA
78687:CustardPie:Dessert:Portugal
75453:Churritos:Dessert:Spain
If I want to power queries with complex multi-selection filters (example to return all keys matching five recipe types from two different countries) which the SCAN
glob-style match pattern can't handle, what is the common way to go about it for a production scenario?
Assuming the I will calculate all possible patterns by doing a cartesian product of all field alternating patterns and multi-field filters:
[[Guacamole, Gazpacho], [Soup, Dish, Dessert], [Portugal]]
*:Guacamole:Soup:Portugal
*:Guacamole:Dish:Portugal
*:Guacamole:Dessert:Portugal
*:Gazpacho:Soup:Portugal
*:Gazpacho:Dish:Portugal
*:Gazpacho:Dessert:Portugal
What mechanism should I use to implement this sort of pattern matching in Redis?
SCAN
for each scannable pattern sequentially and merge the results?SCAN
?ZUNIONSTORE
and solve intersection of different fields with ZINTERSTORE
?<recipe name>:: => key1, key2, keyN
:<recipe type>: => key1, key2, keyN
::<country of origin> => key1, key2, keyN
<recipe name>:: => key1, key2, keyN
<recipe name>:<recipe type>: => key1, key2, keyN
<recipe name>::<country of origin> => key1, key2, keyN
:<recipe type>: => key1, key2, keyN
:<recipe type>:<country of origin> => key1, key2, keyN
::<country of origin> => key1, key2, keyN
I've implemented 1) and performance is awful.
private static HashSet<String> redisScan(Jedis jedis, String pattern, int scanLimitSize) {
ScanParams params = new ScanParams().count(scanLimitSize).match(pattern);
ScanResult<String> scanResult;
List<String> keys;
String nextCursor = "0";
HashSet<String> allMatchedKeys = new HashSet<>();
do {
scanResult = jedis.scan(nextCursor, params);
keys = scanResult.getResult();
allMatchedKeys.addAll(keys);
nextCursor = scanResult.getCursor();
} while (!nextCursor.equals("0"));
return allMatchedKeys;
}
private static HashSet<String> redisMultiScan(Jedis jedis, ArrayList<String> patternList, int scanLimitSize) {
HashSet<String> mergedHashSet = new HashSet<>();
for (String pattern : patternList)
mergedHashSet.addAll(redisScan(jedis, pattern, scanLimitSize));
return mergedHashSet;
}
For 2) I've created a Lua Script to help with the server-side SCAN
and the performance is not brilliant but is much faster than 1) even taking into consideration that Lua doesn't support alternation matching patterns and I have to loop each key through a pattern list for validation:
local function MatchAny( str, pats )
for pat in string.gmatch(pats, '([^|]+)') do
local w = string.match( str, pat )
if w then return w end
end
end
-- ARGV[1]: Scan Count
-- ARGV[2]: Scan Match Glob-Pattern
-- ARGV[3]: Patterns
local cur = 0
local rep = {}
local tmp
repeat
tmp = redis.call("SCAN", cur, "MATCH", ARGV[2], "count", ARGV[1])
cur = tonumber(tmp[1])
if tmp[2] then
for k, v in pairs(tmp[2]) do
local fi = MatchAny(v, ARGV[3])
if (fi) then
rep[#rep+1] = v
end
end
end
until cur == 0
return rep
Called in such a fashion:
private static ArrayList<String> redisLuaMultiScan(Jedis jedis, String luaSha, List<String> KEYS, List<String> ARGV) {
Object response = jedis.evalsha(luaSha, KEYS, ARGV);
if(response instanceof List<?>)
return (ArrayList<String>) response;
else
return new ArrayList<>();
}
For 3) I've implemented and maintained a secondary Index updated for each of the 3 fields using Sorted Sets and implemented querying with alternating matching patterns on single fields and multi-field matching patterns like this:
private static Set<String> redisIndexedMultiPatternQuery(Jedis jedis, ArrayList<ArrayList<String>> patternList) {
ArrayList<String> unionedSets = new ArrayList<>();
String keyName;
Pipeline pipeline = jedis.pipelined();
for (ArrayList<String> subPatternList : patternList) {
if (subPatternList.isEmpty()) continue;
keyName = "un:" + RandomStringUtils.random(KEY_CHAR_COUNT, true, true);
pipeline.zunionstore(keyName, subPatternList.toArray(new String[0]));
unionedSets.add(keyName);
}
String[] unionArray = unionedSets.toArray(new String[0]);
keyName = "in:" + RandomStringUtils.random(KEY_CHAR_COUNT, true, true);
pipeline.zinterstore(keyName, unionArray);
Response<Set<String>> response = pipeline.zrange(keyName, 0, -1);
pipeline.del(unionArray);
pipeline.del(keyName);
pipeline.sync();
return response.get();
}
The results of my stress test cases clearly favor 3) in terms of request latency:
I ended up using a simple strategy to update each secondary index for each field when the key is created:
protected static void setKeyAndUpdateIndexes(Jedis jedis, String key, String value, int idxDimSize) {
String[] key_arr = key.split(":");
Pipeline pipeline = jedis.pipelined();
pipeline.set(key, value);
for (int y = 0; y < key_arr.length; y++)
pipeline.zadd(
"idx:" +
StringUtils.repeat(":", y) +
key_arr[y] +
StringUtils.repeat(":", idxDimSize-y),
java.time.Instant.now().getEpochSecond(),
key);
pipeline.sync();
}
The search strategy to find multiple keys that match a pattern including alternating patterns and multi-field filters was achieved like:
private static Set<String> redisIndexedMultiPatternQuery(Jedis jedis, ArrayList<ArrayList<String>> patternList) {
ArrayList<String> unionedSets = new ArrayList<>();
String keyName;
Pipeline pipeline = jedis.pipelined();
for (ArrayList<String> subPatternList : patternList) {
if (subPatternList.isEmpty()) continue;
keyName = "un:" + RandomStringUtils.random(KEY_CHAR_COUNT, true, true);
pipeline.zunionstore(keyName, subPatternList.toArray(new String[0]));
unionedSets.add(keyName);
}
String[] unionArray = unionedSets.toArray(new String[0]);
keyName = "in:" + RandomStringUtils.random(KEY_CHAR_COUNT, true, true);
pipeline.zinterstore(keyName, unionArray);
Response<Set<String>> response = pipeline.zrange(keyName, 0, -1);
pipeline.del(unionArray);
pipeline.del(keyName);
pipeline.sync();
return response.get();
}