I have a file with sample JSON output looks as below: jsonoutput.txt file:
[{"fruit": "orange", "id":1, "countries": ["Portugal"], "color": "Orange"}
{"fruit": "apple", "id":2, "countries": ["Portugal"], "color": "red"}]
I need the output csv as (excel file):
fruit id countries color
orange 1 Portugal Orange
apple 2 Spain red
Now, I am getting as fruit id countries color orange 1 [u'Portugal'] Orange apple 2 [u'Spain'] red
How can I remove [] , u and '' from the column countries?
print (json.dumps(fruits))
--gives me in json output
This is what I have tried to convert json to xlsx:
data= tablib.Dataset(headers=('Fruit','id','Countries','Color'))
importfile = 'jsonoutput.txt'
data.json = open(importfile. 'r').read()
data_export = data.export('xlsx')
with open('output.xlsx','wb') as f:
f.write(data_export)
f.close()
you could use pandas.io.json.json_normalize
import pandas as pd
from pandas.io.json import json_normalize
d = [
{"fruit": "orange", "id":1, "countries": ["Portugal"], "color": "Orange"},
{"fruit": "apple", "id":2, "countries": ["Portugal"], "color": "red"}
]
df = pd.concat([json_normalize(d[i]) for i in range(len(d))], ignore_index=True)
df['countries'] = df['countries'].str.join(' ')
fruit id countries color
0 orange 1 Portugal Orange
1 apple 2 Portugal red
to save it as .xlsx
file, use:
df.to_excel('filename.xlsx', index=False)
EDIT:
json_normalize
is a function that normalizes semi-structured JSON data into a flat table.
I actually realised now that my code could be simplified to:
df = json_normalize(d) # no need for `pd.concat`
### Output:
# fruit id countries color
# 0 orange 1 ['Portugal'] Orange
# 1 apple 2 ['Portugal'] red
To remove []
from countries
column, I used pandas.Series.str.join
, which is pandas
' equivalent to Python's str.join
.
It's needed since initially countries
column is a list containing elements
df['countries'] = df['countries'].str.join(' ')
countries
column is no longer a list once you join items:
fruit id countries color
0 orange 1 Portugal Orange
1 apple 2 Portugal red