I have to directly convert a xls file to a JSON document using python3 and xlrd.
Table is here.
It's divided in three main categories (PUBLICATION, CONTENU, CONCLUSION) whose names are on column one (first column is zero) and number of rows by category can vary. Each rows has three key values (INDICATEURS, EVALUATION, PROPOSITION) on column 3, 5 and 7. There can be empty lines, or missing values
I have to convert that table to the following JSON data I have written directly has a reference. It's valid.
{
"EVALUATION": {
"PUBLICATION": [
{
"INDICATEUR": "Page de garde",
"EVALUATION": "Inexistante ou non conforme",
"PROPOSITION D'AMELIORATION": "Consulter l'example sur CANVAS"
},
{
"INDICATEUR": "Page de garde",
"EVALUATION": "Titre du TFE non conforme",
"PROPOSITION D'AMELIORATION": "Utilisez le titre avalisé par le conseil des études"
},
{
"INDICATEUR": "Orthographe et grammaire",
"EVALUATION": "Nombreuses fautes",
"PROPOSITION D'AMELIORATION": "Faire relire le document"
},
{
"INDICATEUR": "Nombre de page",
"EVALUATION": "Nombre de pages grandement différent à la norme",
"PROPOSITION D'AMELIORATION": ""
}
],
"CONTENU": [
{
"INDICATEUR": "Développement du sujet",
"EVALUATION": "Présentation de l'entreprise",
"PROPOSITION D'AMELIORATION": ""
},
{
"INDICATEUR": "Développement du sujet",
"EVALUATION": "Plan de localisation inutile",
"PROPOSITION D'AMELIORATION": "Supprimer le plan de localisation"
},
{
"INDICATEUR": "Figures et capture d'écran",
"EVALUATION": "Captures d'écran excessives",
"PROPOSITION D'AMELIORATION": "Pour chaque figure et capture d'écran se poser la question 'Qu'est-ce que cela apporte à mon sujet ?'"
},
{
"INDICATEUR": "Figures et capture d'écran",
"EVALUATION": "Captures d'écran Inutiles",
"PROPOSITION D'AMELIORATION": "Pour chaque figure et capture d'écran se poser la question 'Qu'est-ce que cela apporte à mon sujet ?'"
},
{
"INDICATEUR": "Figures et capture d'écran",
"EVALUATION": "Captures d'écran illisibles",
"PROPOSITION D'AMELIORATION": "Pour chaque figure et capture d'écran se poser la question 'Qu'est-ce que cela apporte à mon sujet ?'"
},
{
"INDICATEUR": "Conclusion",
"EVALUATION": "Conclusion inexistante",
"PROPOSITION D'AMELIORATION": ""
},
{
"INDICATEUR": "Bibliographie",
"EVALUATION": "Inexistante",
"PROPOSITION D'AMELIORATION": ""
},
{
"INDICATEUR": "Bibliographie",
"EVALUATION": "Non normalisée",
"PROPOSITION D'AMELIORATION": "Ecrire la bibliographie selon la norme APA"
}
],
"CONCLUSION": [
{
"INDICATEUR": "",
"EVALUATION": "Grave manquement sur le plan de la présentation",
"PROPOSITION D'AMELIORATION": "Lire le document 'Conseil de publication' disponible sur CANVAS"
},
{
"INDICATEUR": "",
"EVALUATION": "Risque de refus du document par le conseil des études",
"PROPOSITION D'AMELIORATION": ""
}
]
}
}
My intention is to loop through lines, check rows[1] to identify the category, and sub-loop to add data as dictionary in a list by category.
Here is my code so far :
import xlrd
file = '/home/eh/Documents/Base de Programmation/Feedback/EvaluationEI.xls'
wb = xlrd.open_workbook(file)
sheet = wb.sheet_by_index(0)
data = [[sheet.cell_value(r, c) for c in range(sheet.ncols)] for r in range(sheet.nrows)]
def readRows():
for rownum in range(2,sheet.nrows):
rows = sheet.row_values(rownum)
indicateur = rows[3]
evaluation = rows[5]
amelioration = rows[7]
publication = []
contenu = []
conclusion = []
if rows[1] == "PUBLICATION":
if rows[3] == '' and rows[5] == '' and rows[7] == '':
continue
else:
publication.append("INDICATEUR : " + indicateur , "EVALUATION : " + evaluation , "PROPOSITION D'AMELIORATION : " + amelioration)
if rows[1] == "CONTENU":
if rows[3] == '' and rows[5] == '' and rows[7] == '':
continue
else:
contenu.append("INDICATEUR : " + indicateur , "EVALUATION : " + evaluation , "PROPOSITION D'AMELIORATION : " + amelioration)
if rows[1] == "CONCLUSION":
if rows[3] == '' and rows[5] == '' and rows[7] == '':
continue
else:
conclusion.append("INDICATEUR : " + indicateur , "EVALUATION : " + evaluation , "PROPOSITION D'AMELIORATION : " + amelioration)
print (publication)
print (contenu)
print (conclusion)
readRows()
I am having a hard time figuring out how to sub-loop for the right number of rows to separate data by categories.
Any help would be welcome.
Thank you in advance
Using the json
package and the OrderedDict
(to preserve key order), I think this gets to what you're expecting, and I've modified slightly so we're not building a string literal, but rather a dict
which contains the data that we can then convert with json.dumps
.
As Ron noted above, your previous attempt was skipping the lines where rows[1]
was not equal to one of your three key values.
This should read every line, appending to the last non-empty key:
def readRows(file, s_index=0):
"""
file: path to xls file
s_index: sheet_index for the xls file
returns a dict of OrderedDict of list of OrderedDict which can be parsed to JSON
"""
d = {"EVALUATION" : OrderedDict()} # this will be the main dict for our JSON object
wb = xlrd.open_workbook(file)
sheet = wb.sheet_by_index(s_index)
# getting the data from the worksheet
data = [[sheet.cell_value(r, c) for c in range(sheet.ncols)] for r in range(sheet.nrows)]
# fill the dict with data:
for _,row in enumerate(data[3:]):
if row[1]: # if there's a value, then this is a new categorie element
categorie = row[1]
d["EVALUATION"][categorie] = []
if categorie:
i,e,a = row[3::2][:3]
if i or e or a: # as long as there's any data in this row, we write the child element
val = OrderedDict([("INDICATEUR", i),("EVALUATION", e),("PROPOSITION D'AMELIORATION", a)])
d["EVALUATION"][categorie].append(val)
return d
This returns a dict
which can be easily parsed to json. Screenshot of some output:
import io # for python 2
d = readRows(file,0)
with io.open('c:\debug\output.json','w',encoding='utf8') as out:
out.write(json.dumps(d,indent=2,ensure_ascii=False))
Note: in Python 3, I don't think you need io.open
.