Search code examples
sparqlturtle-rdf

Extract synonyms and label from Turtle file using SPARQL


I am in a learning phase of SPARQL. I am working with a Turtle file to extract some information. The condition is: if the exact synonym has a substring 'stroke' or 'Stroke', the query should return all the synonyms and rdfs:label.

I am using below query but getting no output:

prefix oboInOwl: <http://www.geneontology.org/formats/oboInOwl#>
prefix obo: <http://purl.obolibrary.org/obo/>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
               
Select * where { 
  ?s ?p ?o . 
  rdfs:label <http://www.geneontology.org/formats/oboInOwl#hasExactSynonym> "stroke"^^xsd:string
}

Below is the sample Turtle file:

###  https://ontology.aaaa.com/aaaa/meddra_10008196

:meddra_10008196 
  rdf:type owl:Class ;
  <http://www.geneontology.org/formats/oboInOwl#hasDbXref> "DOID:6713" , "EFO:0000712" , "EFO:0003763" , "HE:A10008190" ;
  <http://www.geneontology.org/formats/oboInOwl#hasExactSynonym> 
    "(cva) cerebrovascular accident" , 
    "Acute Cerebrovascular Accident" , 
    "Acute Cerebrovascular Accidents" , 
    "Acute Stroke" , 
    "Acute Strokes" ;
  rdfs:label "Cerebrovascular disorder"@en ;
  :hasSocs "Nervous system disorders [meddra:10029205]" , "Vascular disorders [meddra:10047065]" ;
  :uid "6e46da69b727e4e924c31027cdf47b8a" .

I am expecting this output:

(cva) cerebrovascular accident
Acute Cerebrovascular Accident
Acute Cerebrovascular Accidents
Acute Stroke
Acute Strokes
Cerebrovascular disorder

Solution

  • With this triple pattern, you are querying for rdfs:label as subject, not as predicate:

    rdfs:label <http://www.geneontology.org/formats/oboInOwl#hasExactSynonym> "stroke"^^xsd:string
    

    What you are asking with this is: "Does the resource rdfs:label have the property oboInOwl:hasExactSynonym with the string value 'stroke'?"

    But you want to ask this about the class (e.g., :meddra_10008196), not rdfs:label:

    ?class oboInOwl:hasExactSynonym "stroke" .
    

    Finding matches

    As you don’t want to find only exact string matches, you can use CONTAINS:

    ?class oboInOwl:hasExactSynonym ?matchingSynonym .
    FILTER( CONTAINS(?matchingSynonym, "stroke") ) .
    

    As you want to ignore case, you can query lower-cased synonyms with LCASE:

    ?class oboInOwl:hasExactSynonym ?matchingSynonym .
    FILTER( CONTAINS(LCASE(?matchingSynonym), "stroke") ) .
    

    Displaying results

    To display the label and all synonyms in the same column, you could use a property path with | (AlternativePath):

    ?class rdfs:label|oboInOwl:hasExactSynonym ?labelOrSynonym .
    

    Full query

    # [prefixes]
    
    SELECT ?class ?labelOrSynonym
    WHERE {
    
      ?class rdfs:label|oboInOwl:hasExactSynonym ?labelOrSynonym .
    
      FILTER EXISTS {
        ?class oboInOwl:hasExactSynonym ?matchingSynonym .
        FILTER( CONTAINS(LCASE(?matchingSynonym), "stroke") ) .
      }
    
    }