I am dealing with the streaming data and for that I need to apply some SPARQL type queries. For example, If I have a query like:
Select ?x,?z
FROM <http://dummyURI>
Where { ?x p1 ?y -----(t1)
?x p2 ?z -----(t2)
?z p3 o3 -----(t3)
}
As shown in the query there are three triple patterns (t1, t2, and t3). In the query I observe there are some constraints on variables, i.e., ?x in (t1) should be equal to ?x in (t2) and ?z in (t2) and (t3) should be equal. In my code I can find the triples using some pattern matching corresponding to each triple pattern in query, but how should I ensure the said constraints are satisfied?
I tried to understand how SPARQL engine handles this issue, but it is not mentioned in the following standard resources (res1,res2,res3). Can anyone help me in understanding how should I handle this issue?
Note: I have asked same related question at the link. This question is much more concise as compared to previous one.
Put the streaming issue to one side -- there are streaming SPARQL engines around that deal with that and also a W3C Community Group. A Google search will find them.
Consider the pattern: { ?x p1 ?y . ?x p2 ?z }
.
This is a database join with the constraint.
Any join algorithm will work. Let's take an index join as a reasonably efficient algorithm that
Step 1: Find all ?x p1 ?y
.
Step 2: for each match, take ?x
and look for ?x p2 ?z
for that value of ?x. This is a loop on the values of ?x
from step 1, and there is a single pass so it is streaming on pattern one, and probing on pattern two.
The output is all things passing step 2.
There are many join algorithms from simple inner loop joins though to parallel hash joins and many ways to be more efficient. In the above, starting with the triple pattern that is expected to generates the least number of matches is better.
For your example, extend to 3 patterns, by taking the output of step 2 and applying to ?z p3 o3
If all the data is strictly streaming, see the published work on streaming SPARQL, or work on microbatches. A parallel hash join can stream on both sides though it needs signficant amount of working space.