Search code examples
logicbloxlogiql

How to avoid materialization of recursive logical predicates in logicblox/logiQL?


I am trying to play around with the features of LogicBlox as a Datalog solver. I have performance issues and I think I am using LB wrong because it behaves as if it is materializing all relations (which a Datalog solver with e.g. magic sets would not do).

As I said, I have probably not used LB as it is supposed to, but here is my test. I want to create the transitive closure c(x,y) of some binary relation e(x,y). For the purpose of testing, I am playing with e created as a simple loop, i.e. I added e(i,(i+1)%1000) to LB for 0 ≤ i < 1000.

When I am only interested in from0(x) <- c(0,x) then there is no need to actually materialize c and the magic set method will create a predicate c_{bound,free}(x,y) and compute from0(0) then derive from0(1), etc. the whole operation taking roughly 1000 steps.

If I do my example with the program:

e(x,y) -> int(x), int(y).
// fill e with data
c(x,y) -> int(x), int(y).
c(x,y) <- e(x,y) ; c(x,z),e(z,y).
from0(x) <- c(0,x). 

Then, obviously, I am producing a materialized version of c and c will contain all pairs of elements; therefore the total operation takes a time roughly of 1000^2 (and when I run the query, I see that it actually takes some time to compute).

From the documentation, LogicBlox allows predicates to be defined as "Derived" but for c it does not seem to be possible as c recurse on itself.

Now, I have also tried to define this transitive closure with a "local predicate" in a query or an exec block, but without success. Example of what I tried:

query '
   _c(x,y)->int(x),int(y). 
   _c(x,y) <- e(x,y) ; e(x,z),_c(z,y). 
   _(x) <- _c(0,x).'

Obviously on this example I could manually optimize the query and define a block :

f0(x)->int(x). 
f0(y)<- e(0,y) ; f0(x),e(x,y).

but if I understand LB correctly there should be a way to leave the optimization to LB.


Solution

  • I am not sure how you define a "datalog solver", but it is perhaps better to understand LogicBlox as a database that uses a datalog-derived language as its query language.

    As you note, LogicBlox does not automatically apply any kind of optimization similar to magic sets. Additionally, the unfortunately named "Derived" derivation-type does not work in your case as it avoids materialization by inlining. However, it is not possible to inline away recursive predicates.

    If you are using a version of the platform older than 4.4.9, then yes, unfortunately your only option would to manually perform some manner of magic sets transformation on your logic.

    If you are working with LogicBlox 4.4.9 or newer, we have added a new derivation-type "OnDemand" that will do what you want. It will rewrite the rules for a predicate internally so that the only the "demanded" tuples are computed. This is not quite the same as the classical magic sets rewrite, and is similar to what has been called "demand transformation" in more recent literature (see http://doi.acm.org/10.1145/1836089.1836094).

    However, this will still require a small change in your example. It is necessary to supply a "demand" for all key (as opposed to value) arguments of a predicate. So you would need to rewrite your example into

    e(x,y) -> int(x), int(y).
    e(x, int:mod[x + 1,1000]) <- int:range(0,1000,1,x).
    nodes(x) <- e(x, _); e(_, x).
    
    c(x,y) -> int(x), int(y).
    c(x,y) <- e(x,y); c(x,z), e(z,y).
    //lang:derivationType[`c]="OnDemand".
    
    from0(x) <- c(0,x), nodes(x).
    

    Uncommenting the above line will apply the transformation. This yields about a 7x speedup when run on my laptop.

    As another example, here is the Fibonacci function

    fib[i]=f -> int(i), int(f).
    lang:derivationType[`fib]="OnDemand".
    
    fib[0]=0.
    fib[1]=1.
    fib[i]=f1+f2 <- i >= 2, fib[i-1]=f1, fib[i-2]=f2.
    

    We have also used "OnDemand" predicates to implement more elaborate relations and functions for things like CYK parsing or evaluating the lambda-calculus.