Search code examples
regexsparqlrdfsemantic-webapache-jena

How to make the matching using FILTER and regex in sparql?


I'd like to match an input given by user (String) with a value (String) of a specific node existing in rdf file.

I applied the following exact mode for matching (input=NodeValue):

 ...
 FILTER regex (?NodeValue,"userinput$","i").

for this type of matching (input < NodeValue ) I used the following:

...       
 FILTER regex (?NodeValue,".*userinput.*","i").

So, my question is how to set my regex in order to get the type of matching when (input > NodeValue) I mean a query that's returns a list of ?nodeValue subsumed by a given user input.

Eg. if the user enters patagoniaisbeautiful it returns patagonia.

Thank you in advance.


Solution

  • To achieve a match where the database value is a substring of your user input, you need to flip your arguments for the regex function around. That way, the actual value in the database is used as the regular expression, and the user input as the string to match it:

    FILTER(REGEX("patagoniaisbeautiful", STR(?NodeValue), "i"))
    

    This will succeed if ?NodeValue is "patagonia". Of course it will also match if ?NodeValue is "p", "a", "t", etc.

    In fact, given that you are only interested in simple substring matching here, you can simplify this by using the CONTAINS function, instead of the (computationally expensive) REGEX operation. Like so:

    FILTER(CONTAINS("patagoniaisbeautiful", LCASE(STR(?NodeValue))))
    

    As an aside: you give an example of doing a regex where the user input is a substring of the database value: ".*userinput.*". The leading and closing .* here are unnecessary. A SPARQL regex match is by definition a substring match.