Search code examples
full-text-searchrdfsparqldbpediafoaf

Retrieving a DBpedia resource by its string name with SPARQL and without knowing its type


As shown in this question which has a similar title, I would like to retrieve a dbpedia resource by knowing a part of its name. I'm a beginner when it comes to SPARQL and such, but the example in the question helped me a lot, as the author searched for "Romania", and the person answering hooked him up with a Sparql request to do the job. That's nice, but here's the thing.

In the example, they already "knew" that Romania is a country, hence the

    ?c a dbpedia-owl:Country ;

in the WHERE clause. The complete sparql request being

    SELECT ?c
    WHERE {
    ?c a dbpedia-owl:Country ;
    foaf:name "Romania"@en .
    FILTER NOT EXISTS {?c dbpedia-owl:dissolutionYear ?y}
    } 

But, this question doesn't quite completely answer our need, hence searching for ANY resource by its name, the "name" being the actual name of a resource, or a part of it, regardless of its (rdf:)type. The goal would be to search for "anything", just knowing the name or a part of it.

I've been doing some research before asking you guys this question, and I already know that the "part of the name" problem could be resolved with bif function (the bad way, since it's not sparql compliant), or the CONTAINS clause, but I couldn't find any example showing how to use it.

Let's now suppose that there's a "word" to search for among the dbpedia resources, that word would be an input from some user. And let's call it "INPUT".

The request, I would imagine, would look like :

   SELECT ?something WHERE
   {
    ?something a (dbpedia Resource).
    CONTAINS(?something,"INPUT")
   }

My major question is about two major aspects :

  1. Is there anything that describes the type Dbpedia Resource ? I don't think it's in ontology or anything. By knwoing that I would like to search among all the resources to find one matching ...
  2. A specific name I would provide, or some string. I considered the FILTER option, but that would mean getting ALL the resources, and then filtering them by their name after they have been retreived, which would be, I guess, not so optimal.

So, does anyone knows this "Master Query" to get a resource by providing its name, or a part of it ? (An example being providing "Obama", and getting results not only for Barrack, but for Michelle as well).

Thank you in advance.


Solution

  • I'm assuming that in your first question you are interested in looking at just instance resources. I don't know if you can explicitly ask just for instance resources in the general case, since in RDF everything is a resource. If you specifically need this for the DBpedia dataset you can query for resources that have dcterms:subject as a property (in DBPedia only instance resources have a dcterms:subject). So you can have a query like this:

    SELECT DISTINCT ?s ?label WHERE {
                ?s rdfs:label ?label . 
                FILTER (lang(?label) = 'en'). 
                ?label bif:contains "Obama" . 
                ?s dcterms:subject ?sub 
    }
    

    Similarly for your second question - if you are using just the DBpedia dataset you might want to use "bif:contains" although is not SPARQL compliant. I don't think there is another optimal way to do this and as you said using FILTER will be sub-optimal especially if you need to execute queries quickly. I think that keyword search and indexing is handled ad-hoc by each triple store there is not yet a standardized way to to full-text searchers.

    So to sum up, if you work with dbpedia only just use the features of the store and the specifics of the dataset to solve your problem.