Search code examples

Getting JSON/Dictionary of all properties in DBPedia for a page/resource from Wikipedia Infobox

I'm trying to get a representation of the infobox of articles on Wikipedia in a Python project. I had tried using the Wikipedia API, but the data it outputs is dirty, so I'm trying to move to DBpedia. I need to be able to query by page name, and receive a dictionary of the property names and their values for that page. For example, for the query for London, the returned dictionary would contain:

{dbpedia-owl:PopulatedPlace/areaMetro : 8382.0,
 dbpedia-owl:PopulatedPlace/areaTotal : 1572.0
 dbpedia-owl:populationDensity : 5285.0

etc., and from this I would be able to read all the keys that were in the Infobox. I did try using the SPARQL query of

describe <>

but that returned tonnes of unnecessary data &emdash; the full set of triplets associated with London &emdash; which is many orders of magnitude more than I need.

How can I write a query to just get the infobox properties, as above?


  • You might be able to get what you want by selecting properties and objects where the property IRI begins with something you're interested in (e.g., You could use a query like the following. (It takes advantage of the fact that a prefix by itself, e.g., dbpedia-owl:, is still a legal IRI, and you can use str on it. You could also just use the string

    select ?p ?o where {
      dbpedia:London ?p ?o
      filter strstarts(str(?p),str(dbpedia-owl:))

    SPARQL results (HTML Table)
    SPARQL results (JSON)

    The JSON results aren't quite in the format you're looking for, but are like this:

    { "head": { "link": [], "vars": ["p", "o"] },
      "results": { "distinct": false, "ordered": true, "bindings": [
        { "p": { "type": "uri", "value": "" }   , "o": { "type": "uri", "value": "" }},
        { "p": { "type": "uri", "value": "" }   , "o": { "type": "uri", "value": "" }},
        { "p": { "type": "uri", "value": "" }   , "o": { "type": "uri", "value": "" }},
        { "p": { "type": "uri", "value": "" }   , "o": { "type": "uri", "value": "" }},
        { "p": { "type": "uri", "value": "" }   , "o": { "type": "uri", "value": "" }},
        { "p": { "type": "uri", "value": "" }   , "o": { "type": "uri", "value": "" }},
        { "p": { "type": "uri", "value": "" }   , "o": { "type": "uri", "value": "" }},
        { "p": { "type": "uri", "value": "" }   , "o": { "type": "uri", "value": "" }},
        { "p": { "type": "uri", "value": "" }  , "o": { "type": "uri", "value": "" }},

    That sort of makes sense though, because there's not necessarily a unique value for each property, so a Python dict as in the question probably isn't the best result format (but it'd be easy to create one where multiple values are put into a list).

    Also note that the properties that begin with dbpedia-owl: are actually the DBpedia Ontology properties, which have much cleaner data than the raw infobox values, for which properties beginning with dbpprop: are used. You can read more about the different datasets at 4.3. Infobox Data. A query for the raw properties would be pretty much the same though:

    select ?p ?o where {
      dbpedia:London ?p ?o
      filter strstarts(str(?p),str(dbpprop:))

    SPARQL Results (HTML Table)