I'm using Jena to query data stored in an ontology. Some of the objects are identified by a string, however sometimes the exact same string is not available, as I am processing scanned documents and so there may be OCR-Errors. Therefore, I'd like to find the most similar strings. Is there a way to use SPARQL for this purpose? Can I somehow calculate levenshtein distance in SPARQL?
If this is not possible, I can still calculate the levenshtein distance in java. However, an efficient algorithm would still require to filter out irrelevant strings using SPARQL.
SPARQL can't do this directly, but you could implement the levenshtein distance function in java, and use it in a SPARQL FILTER clause. Extensions in ARQ has details about using extension functions.