I have a json file which looks like this
{
"file": "name",
"main": [{
"question_no": "Q.1",
"question": "what is ?",
"answer": [{
"user": "John",
"comment": "It is defined as",
"value": [
{
"my_value": 5,
"value_2": 10
},
{
"my_value": 24,
"value_2": 30
}
]
},
{
"user": "Sam",
"comment": "as John said above it simply means",
"value": [
{
"my_value": 9,
"value_2": 10
},
{
"my_value": 54,
"value_2": 19
}
]
}
],
"closed": "no"
}]
}
desired result:
Question_no question my_value_sum value_2_sum user comment
Q.1 what is ? 29 40 john It is defined as
Q.1 what is ? 63 29 Sam as John said above it simply means
What I have tried is data = json_normalize(file_json, "main")
and then using a for loop like
for ans, row in data.iterrows():
....
....
df = df.append(the data)
But the issue using this is that it is taking a lot of time that my client would refuse the solution. there is around 1200 items in the main
list and there are 450 json files like this to convert. So this intermediate process of conversion would take almost an hour to complete.
EDIT:
is it possible to get the sum of the my_value
and value_2
as a column? (updated the desired result also)
Select dictionary by main
with parameter record_path
and meta
:
data = pd.json_normalize(file_json["main"],
record_path='answer',
meta=['question_no', 'question'])
print (data)
user comment question_no question
0 John It is defined as Q.1 what is ?
1 Sam as John said above it simply means Q.1 what is ?
Then if order is important convert last N columns to first positions:
N = 2
data = data[data.columns[-N:].tolist() + data.columns[:-N].tolist()]
print (data)
question_no question user comment
0 Q.1 what is ? John It is defined as
1 Q.1 what is ? Sam as John said above it simply means