Search code examples
pythonsqlcsvsparqlwikidata

Python - SPARQWrapper returns me a XML file when I request a CSV file. Need help converting the data into an SQL table


I wrote this code to get names of all gen 1 pokemon from Wikidata, which I then want to turn into a SQL table.

```python
sparql = SPARQLWrapper("https://query.wikidata.org/sparql")

sparql.setQuery("""
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>

SELECT DISTINCT ?pokemon ?pokemonLabel
WHERE {
  ?pokemon wdt:P361 wd:Q3245450.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}""")

sparql.setReturnFormat(CSV)

result = sparql.query().convert()

pandas.read_csv(result).to_sql('PokemonDBP', conn, if_exists = 'replace', index = False)
```

However, the code returns a ValueError, Invalid file path or buffer object type: <class 'xml.dom.minidom.Document'>. It seems like SPARQLWrapper cannot return the data from Wikidata in a CSV format?

I tried rewriting the code to return JSON, and then created a loop to check if the data was recorded properly:

```python
sparql.setReturnFormat(JSON)

result = sparql.query().convert()
for res in result["results"]["bindings"]:
        print(res["pokemonLabel"]["value"], 
              res["pokemon"]["value"])
```

However, while the results on the Wikidata query on their website look like this:
pokemon pokemonLabel
wd:Q2488577 Muk
wd:Q2739523 Golem
...

the results of this loop look like this:
Q2485997 http://www.wikidata.org/entity/Q2485997
Q2486013 http://www.wikidata.org/entity/Q2486013
...

I would appreciate any help and suggestions on how I can turn the results of this SPARQL query into an SQL table.


Solution

  • I have found a solution to my issue after browsing Stack Overflow some more.

    I changed this line:

    SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]".
    

    to this line:

    ?pokemon rdfs:label ?pokemonLabel. filter(lang(?pokemonLabel) = "en").
    

    Link to the post I found the solution in