Search code examples
regexsparqlrdf

What flavor of regex should be used with DBpedia/Virtuoso SPARQL?


I'm trying to match only full strings, and not substrings in sparql using FILTER.

I am querying on DBPedia (which is hosted on Virtuoso).

I'm not quite sure if SPARQL supports wordbounds, seeing how using something like

FILTER(regex(?name, "V", "i"))

will find those who contain V, IV, VI, VII, and so forth.

Now, I've tried using

FILTER(regex(?name, "\<V\>", "i"))

which generates a compile error on the endpoint

Virtuoso 37000 Error SP030: SPARQL compiler, line 0: Bad escape sequence in a short double-quoted string at '"\'

I've also tried doing

FILTER(regex(?name, "\bV\b", "i"))

And while this query is accepted, it doesn't return any results, which I guess is because it takes it as backspace instead of wordbounds.

I've tried looking for the flavor it uses, the only thing I found is that it uses XQuery 1.0 and XPath 2.0 Functions and Operators

Thanks for your time!


Solution

  • Word bounds work with \\b, see

    SELECT DISTINCT ?s ?l WHERE {
      ?s a <http://dbpedia.org/ontology/SoccerClub> ;
         <http://www.w3.org/2000/01/rdf-schema#label> ?l
      FILTER(LANGMATCHES(LANG(?l),'en'))
      FILTER(REGEX(STR(?l), "\\bD", "i"))
    } LIMIT 100
    

    which returns soccer clubs with a word beginning with "d" in its English name.

    ETA: Virtuoso developers report that it uses Perl Compatible Regular Expressions.