Search code examples
pythongoogle-colaboratoryexport-to-csvwikidata

Create CSV from result of a for Google Colab


I'm using Wikidata query service to obtain values and this is the code:

pip install sparqlwrapper

import sys
from SPARQLWrapper import SPARQLWrapper, JSON

endpoint_url = "https://query.wikidata.org/sparql"

query = """#List of organizations 

SELECT ?org ?orgLabel
WHERE
{
  ?org wdt:P31 wd:Q4830453. #instance of organizations
  ?org wdt:P17 wd:Q96. #Mexico country

  SERVICE wikibase:label { bd:serviceParam wikibase:language "en"}
}"""


def get_results(endpoint_url, query):
    user_agent = "WDQS-example Python/%s.%s" % (sys.version_info[0], sys.version_info[1])
    # TODO adjust user agent; see https://w.wiki/CX6
    sparql = SPARQLWrapper(endpoint_url, agent=user_agent)
    sparql.setQuery(query)
    sparql.setReturnFormat(JSON)
    return sparql.query().convert()

results = get_results(endpoint_url, query)

for result in results["results"]["bindings"]:
    print(result)

This code give me the data that I need but I'm having problems trying to get this information with this line:

results.to_csv('results.csv', index=False)

with this error:

'dict' object has no attribute 'to_csv'

I import pandas and numpy to do it, but I'm still with problems so I would like to know how to put this results in a format to create my csv file with the data obtained.

Here you have some screenshots.

screenshot 1

screenshot 2


Solution

  • results is a dictionary, that is a python data structure which you can't invoke a method to_csv on.

    For safely storing a csv from a python dictionary you can use external libraries (see also the documentation on python.org).

    The specific solution depends on which (meta)data you exactly want to export. In the following I assume that you want to store the value for org and orgLabel.

    import csv
    bindings = results['results']['bindings']
    sparqlVars = ['org', 'orgLabel']
    metaAttribute = 'value'
    with open('results.csv', 'w', newline='') as csvfile :
        writer = csv.DictWriter(csvfile, fieldnames=sparqlVars)
        writer.writeheader()
        for b in bindings :
            writer.writerow({var:b[var][metaAttribute] for var in sparqlVars})
    

    And the output is:

    org,orgLabel
    http://www.wikidata.org/entity/Q47099,"Grupo Televisa, owner of TelevisaUnivision"
    http://www.wikidata.org/entity/Q429380,Aeropuertos y Servicios Auxiliares
    http://www.wikidata.org/entity/Q482267,América Móvil
    ...