Search code examples
jsondiff

Is there an established representation for the difference between two JSONs?


Are there any established or existing formats or conventions for representing the diff between two JSON documents?

Lets say that two remote nodes (or a server/client) both have some data represented as a potentially complex JSON, the structure of which is not known before runtime. One wants to send an update to the other, but without sending the whole state as one large JSON. Instead, just the delta. What would be a good way to represent the delta (or diff) between any two JSON documents? They will likely be very similar (one small change), but might not be.


Solution

  • JSON documents are essentially trees, with leaves containing name/value pairs.

    What you want to do is transmit a minimal tree delta: the smallest set of edits that converts one tree into another.

    Computing tree deltas is a bit of an art, partly because it depends on the kinds of deltas you allow (just insert/delete leaf? swap subtrees? move subtree? duplicate subtree? rename names? replace values?). You also need to take into account semantic equivalences; if you commute the position of two subtrees, is the result semantically different? (Your delta detector may see such a tree swap; the semantic identity check may eliminate it as not interesting). If you duplicate a subtree, is the answer semantically different? (I think for JSON the effective answer is "no").

    You need something like a dynamic programming algorithm to determine such a minimal delta; you can take inspiration from Levenshtein distance on strings.

    This is a common issue of interest to programmers regarding source code. Think of a JSON document as source code, and see the answers at https://stackoverflow.com/q/5779160/120163 for further discussion.