Search code examples
jsonata

JSONata performance / efficiency assumptions


I'm really enjoying JSONata -- awesome library, thanks!

One of the things I have been toying with is a join of two arrays of objects on a particular key value.

Consider the library example (a bit simplified):

library.loans@$L.books@$B[$L.isbn=$B.isbn].{
  'customer': $L.customer,
  'book': $B.title
}

I get the impression (from your docs etc) that this iterates the books array for every entry in the loans array and then filters the output.

Question 1: Is this the right way of thinking about it, from a performance standpoint? Does it matter whether one starts with .loans or .books ? A quick test shows the code can be written either way... Intuitively I would imagine that if I had a prefilter (e.g, just looking at 'overdue' loans), then if I started with loans and filtered it first, I could increase performance. Sound reasonable, or is this jumping to conclusions?

Question 2: Could it be more efficient (for larger datasets, obviously) to re-write this query using a map, for O(m+n) instead of O(m*n)? I mean something like:

(
  $booksById := library.books{$.isbn: $};

  library.loans.{
    'customer': $.customer,
    'book': $lookup( $booksById, $.isbn ).title
  }
)

Thanks again for the library -- easy to learn and super useful.


Solution

  • Q1: In this case, it won't make any difference which way round you write it. Internally, it creates a tuple stream containing all permutations of loans and books which then gets filtered down by the predicate expression.

    Q2: This might be more efficient (hint: look at the $distict() function), but it will give different results. The original expression is doing an inner join (in SQL terms) whereas your alternative one is an outer join - i.e. it'll produce an object for the loan even if the book doesn't exist.

    Thanks for the feedback, BTW.