Search code examples
filtersparqlrdfboolean-operations

Is there priority in a SPARQL OR filter instruction?


I'd like to know whether the order of the words gives or changes the priority in an OR filter SPARQL instruction. For example

FILTER( regex(STR(?keywords), "test1", "i")
        || regex(STR(?keywords), "test2", "i")
        || regex(STR(?keywords), "test3", "i")
        || regex(STR(?keywords), "test4", "i")
        || regex(STR(?keywords), "test5", "i") )

Does this query indicate that test1 has more priority than test2 in the filtering of the results? In other words, does it affect the order of the results? In case where I am limiting the results to 20 for example which is lower than the total (let's say 60), will I get the results of test1 first and then test2?

If not, is there a way to establish such a priority?


Solution

  • Does this query indicates that test1 has more priority than test2 in the filtering the results ? in other words, does it affect the order of the results? in case where I am limiting the results to 20 for example which is lower than the total (let's say 60), will I get the results of test1 first and then test2 ?

    When there's no order by at the end of the query, the order is not specified, so an implementation could certainly change the ordering based on the filter, but it doesn't seem particularly likely, at least not in any obvious way. When you ask, "will I get the results of test1 first and then test2," it suggests a bit a of a misunderstanding about what filter does. Filter is a way of specifying some conditions that a result must satisfy in order to be included in the result set. Conceptually, before the filter, there is a list of results in a working result set, and then the filter test is applied to each one, and the ones that satisfy the test are kept. There are no "results" of test1 or test2 in any meaningful sense. Different ways of applying the test could affect the ordering, e.g., depending on whether its:

    final results = {}
    for (result in working results) 
      for (testi in tests)
        if result passes test
          add result to final results
    

    or

    final results = {}
    for (testi in tests)
      for (result in working results) 
        if result passes test
          add result to final results
    

    Since final results is a set, you'd end up with the same set, but since the results are eventually provided as a list, it's conceivable that this could affect the order you see the results in. But as I said above, it's unlikely that you'd get any reliable, meaningful difference this way.

    if not, is there a way to establish such a priority?

    If you want to order the results in some way, you'll need to figure out how to get that priority, and then you can order by it. For instance, in your case, you could do something like:

    select ?keywords {
      values ?test { "test1" "test2" "test3" }
      #-- get a binding for ?keywords...
      filter regex(str(?keywords), ?test)
    }
    order by ?test
    

    In this case, since the values of ?test have a reliable ordering, you can order the results based on which values of ?test matched. Of course, if ?keywords matches more than one value of ?test, you'll see it multiple times in the results.

    If the criteria isn't as simple as a string, you can still use values to specify the priority:

    select ?keywords {
      values (?test ?priority) { ("test1" 3) ("test2" 0) ("test3" 5)}
      #-- get a binding for ?keywords...
      filter regex(str(?keywords), ?test)
    }
    order by ?priority