Search code examples
sparqlrdfdbpedia

SPARQL: Provide alternate paths with differing lengths


I'm trying to link some local data up with DBpedia to extract information about countries' economic stats. How can I compensate for alternate paths with differing lengths? The field itself is OPTIONAL so that the query doesn't miss a result if it happens to not have language listed, but I am getting blank language columns on resources that do have languages listed.

For instance, http://dbpedia.org/page/Netherlands, http://dbpedia.org/page/Ireland, and http://dbpedia.org/page/Italy index the languages spoken very differently, from a string to different predicates referencing a resource:

Netherlands:

Netherlands Language Screenshot

Ireland:

Ireland Language Screenshot

Italy:

Italy Language Screenshot

Here's a (stripped-down) example query that kind of works, but is not great:

SELECT DISTINCT
?countryName
?dbEntry
(GROUP_CONCAT(DISTINCT ?dbLanguage; separator=", ") AS ?languages)

WHERE
{
    ?dbEntry a dbo:Place ;
        rdfs:label | dbo:longName ?countryName .


    # For some reason, stacking two OPTIONALs and BINDing is all that seems to work here, and still not 100%
    OPTIONAL {
        ?dbEntry dbo:language / foaf:name ?dbofLanguage .
        BIND(?dbofLanguage AS ?dbLanguage) .
    }

    OPTIONAL {
        ?dbEntry dbp:languages ?dbpLanguage .
        BIND(?dbpLanguage AS ?dbLanguage) .
    }
    FILTER (STR(?countryName) IN ("Netherlands", "Italy", "Ireland")) .
}
GROUP BY ?countryName ?dbEntry
LIMIT 3

DBpedia Link

You'll see the results come back formatted entirely differently:

DBpedia result screenshot

I'd like to write something like

OPTIONAL {
    ?dbEntry (dbo:language / foaf:name) | (dbp:languages / rdfs:label) | dbp:languages ?language
}

but I'm thinking SPARQL doesn't support anything that complex yet? (I get zero results)


Solution

  • Edited to correct query, having realized your issue...

    SELECT DISTINCT                                                           ?countryName
                                                                              ?dbEntry
                    ( GROUP_CONCAT ( DISTINCT ?language ; separator=", " ) AS ?languages )
    WHERE
      {
            ?dbEntry a                                                              dbo:Place ;
                     rdfs:label | dbo:longName                                      ?countryName .
        OPTIONAL
          {
            ?dbEntry ( dbo:language / foaf:name ) | ( dbp:languages / rdfs:label ) | ( dbp:languages ) ?language
            FILTER isLiteral ( ?language )
          }
        FILTER ( STR ( ?countryName ) IN ( "Netherlands" , "Italy" , "Ireland" ) ) .
      }
    GROUP BY ?countryName ?dbEntry
    

    Note -- these properties (and thus your query) will change drastically in the next version of DBpedia. Check out the current DBpedia Live page on Ireland, for example.


    This appears to do what you want, with just a little bit more Property Path (the ? operator on rdfs:label following dbp:languages)--

    SELECT DISTINCT                                                           ?countryName
                                                                              ?dbEntry
                    ( GROUP_CONCAT ( DISTINCT ?language ; separator=", " ) AS ?languages )
    WHERE
      {
            ?dbEntry a                                                              dbo:Place ;
                     rdfs:label | dbo:longName                                      ?countryName .
        OPTIONAL
          {
            ?dbEntry ( dbo:language / foaf:name ) | ( dbp:languages / rdfs:label? ) ?language
          }
        FILTER ( STR ( ?countryName ) IN ( "Netherlands" , "Italy" , "Ireland" ) ) .
      }
    GROUP BY ?countryName ?dbEntry