Search code examples
jsonredisredisjson

Redis / Rejson nested document hierarchy performance


I'm storing records as json documents, organized into a hierarchy based on record type. The traditional way to store these in Redis is to have something like:

customer:walmart   = {...}
customer:target    = {...}
order:po123        = {...}
person:bob         = {...]
person:tom         = {...}

However, Rejson (aka RedisJSON) lets us query subdocuments from a path efficiently, as it stores the json as actual hashtables of hashtables. So I could instead organize my records in an actual hierarchy, which would also help accomodate the distinct key limits in Redis at the root.

["customer"]    = {"walmart": {...}, "target": {...}, "amazon": {...}}
["order"]       = {"po1": {...}, "po2": {...}, "po3": {...}}
["transaction"] = {"uuid1": {...}, "uuid2": {...}, "uuid3": {...}}
["person"]      = {"bob": {...}, "tom": {...}, "dave": {...}}
["widget"]      = {"this": {...}, "that": {...}, "other": {...}}

I regularly retrieve one or more records of the same type. For example, I may want to retrieve target (a customer) or both bob and tom (persons). I rarely retrieve all records from a single type.

What is the performance difference between these two different approaches? Does Rejson make retrieving a subdocument based on a json path ('record' above) roughly as efficient as retrieving the document from the root Redis store?

Rejson doesn't appear to have a way to retrieve bob and tom above in a single command/fetch. mget fetches a common path across multiple root Redis keys. That's the opposite of what I want, and is a sign I'm abusing Redis.

Even with Rejson, should deliberate data hierarchies used in this way be considered bad practice due to performance penalties?


Solution

  • What is the performance difference between these two different approaches? Does Rejson make retrieving a subdocument based on a json path ('record' above) roughly as efficient as retrieving the document from the root Redis store?

    RedisJSON is retrieving subdocument parts based on JsonPath so obviously the more complex he JsonPath is the more overhead it will add. But simple paths like you mentioned above shouldn't impose big overhead.

    Rejson doesn't appear to have a way to retrieve bob and tom above in a single command/fetch.

    RedisJSON json.get supports multi paths so you can call JSON.GET customer .bob .tom. Also the upcoming RedisJSON 2.0 includes full support for JsonPath, so you should be able to run JSON.GET customer .["bob"|"tom"]

    Even with Rejson, should deliberate data hierarchies used in this way be considered bad practice due to performance penalties?

    The thing you should consider here is, do you need all "parts" sit at the same shard in case you're using Redis Cluster, would you need to atomically/transaction update those part?