Search code examples
pythonlistcsvsparqlsparqlwrapper

How to iterate and write a list as the CSV header?


I have a use case, where I pull the data from a SPARQL endpoint as follows:

SPARQL = SPARQLWrapper(endpointURL)  
SPARQL.setQuery(queryString) 
SPARQL.setReturnFormat(JSON) 
results = SPARQL.query().convert() 

The 'results' variable holds the data that now I want to write in a CSV file. First, I created the header for the CSV as follows:

schema = ['Age', 'Sex', 'Chest_Pain_Type', 'Trestbps', 'Chol', 'Fasting_Glucose_Level', 'Resting_ECG_Type', 'ThalachD', 'Exercise_Induced_Angina', 'OldpeakD', 'CaD', 'Slope', 'Thallium_Scintigraphy', 'Diagnosis']

I then wrote the data of each row in a CSV file using the following code:

with open('DIC.csv', 'w+') as csvfile: 
        writer = csv.writer(csvfile, delimiter=',') 
        writer.writerow([g for g in schema]) 
        for result in results["results"]["bindings"]: 
                    writer.writerow([result["Age"]["value"], 
                    result["SexTypes"]["value"],
                    result["Chest_Pain_Type"]["value"], 
                    result["trestbpsD"]["value"],
                    result["cholD"]["value"], 
                    result["Fasting_Glucose_Level"]["value"],
                    result["Resting_ECG_Type"]["value"],
                    result["thalachD"]["value"],
                    result["Exercise_Induced_Angina"]["value"],
                    result["oldpeakD"]["value"],
                    result["caD"]["value"],
                    result["Slope"]["value"],
                    result["Thallium_Scintigraphy"]["value"],
                    result["Diagnosis"]["value"]])
        file_name = csvfile.name
        csvfile.close()

In the preceding code block, result["SexTypes"]["value"] is used to write the value (i.e. "value") of column "SexTypes". That means, the first index is variable but the second index is always the same.

Although, the above code works fine but this is pretty much hard-coded and fails once my SPARQL query changes(i.e. if the schema is different).

I now want to make it more flexible such that I iterate all the column from the list but the "value" is fixed. For doing this, I tried with the following code, which eventually fails:

with open('DIC.csv', 'w+') as csvfile: 
        writer = csv.writer(csvfile, delimiter=',') 
        writer.writerow([g for g in schema]) 
        r = "value"
        for result in results["results"]["bindings"]: 
            for i in range(len(schema)):
                writer.writerow([result[schema[i]][r]])
                i = i + 1       
        file_name = csvfile.name
        csvfile.close()
        return file_name

I know, I'm doing somethig wrong. Any better suggestion, please?

[Maybe the problem deserves a better title but I couldn't find any. Sorry for my bad English, though.]


Solution

  • The input of writer.writerow() in the first example code is a list, with the same length as schema, whereas in the second it is a length 1 list (and instead of calling writerow() once you call is once per element in schema.

    To generate the list you had in the first place you can do it with a list comprehension:

    row = [result[column]["value"] for column in schema]
    

    which is equivalent to:

    row = [] # create empty list
    for column in schema:
        row.append(result[column]["value"])
    

    Eventually, the code would only have to be modified inside the loop over results:

    ...
    for result in results["results"]["bindings"]: 
        row = [result[column]["value"] for column in schema]
        writer.writerow(row)
    file_name = csvfile.name
    ...