Search code examples
regexdata-bindingsparql

Using regex in SPARQL to bind variables?


I'm working on an OWL knowledge graph for info about patients in the Covid pandemic. I've been using SPARQL to transform strings from spreadsheets into the appropriate objects and values of properties.

I have strings like Infected by P231 and P456 and P39393 What I want is something that can bind variables to the patient ids. I thought this shouldn't be too hard because the strings only follow a few patterns. E.g, strings will have one, two, or three Patient IDs and no more so I could write a query that matches each separate case.

I thought I could use regex to do this but now that I look at regex more carefully I think all it can do is tell me that such Patient IDs exist but unlike functions such as SUBSTR that will actually return part of the string that I want so I can bind it to a variable, regex just returns true or false that some string matches the pattern or it doesn't. Is that correct?

If that is correct are there other ways to do pattern matching in SPARQL where I can actually bind variables to a substring that matches part of the pattern? Or do I need to resort to a full programming language like Python to do this?


Solution

  • REPLACE is the function to apply a regular expression, with () match groups, and calculate a return string based on the match using $1 to get the group actually matched. It is based on fn:replace from "XPath and XQuery Functions and Operators" as are many of the SPARQL functions.

    BIND (REPLACE("123", "(.)..", "$1") AS ?str)
    

    will set ?str to "1".