Search code examples
pythonjsoncsvtablib

Convert Json with sub fields to CSV in python


I have a file with sample JSON output looks as below: jsonoutput.txt file:

[{"fruit": "orange", "id":1, "countries": ["Portugal"], "color": "Orange"}

{"fruit": "apple", "id":2, "countries": ["Portugal"], "color": "red"}]

I need the output csv as (excel file):

fruit id countries color
orange 1 Portugal Orange
apple 2  Spain     red

Now, I am getting as fruit id countries color orange 1 [u'Portugal'] Orange apple 2 [u'Spain'] red

How can I remove [] , u and '' from the column countries?

print (json.dumps(fruits)) --gives me in json output

This is what I have tried to convert json to xlsx:

data= tablib.Dataset(headers=('Fruit','id','Countries','Color'))
importfile = 'jsonoutput.txt'
data.json = open(importfile. 'r').read()
data_export = data.export('xlsx')
with open('output.xlsx','wb') as f:
    f.write(data_export)
    f.close()

Solution

  • you could use pandas.io.json.json_normalize

    import pandas as pd
    from pandas.io.json import json_normalize
    
    d = [
        {"fruit": "orange", "id":1, "countries": ["Portugal"], "color": "Orange"},
        {"fruit": "apple", "id":2, "countries": ["Portugal"], "color": "red"}
    ]
    
    df = pd.concat([json_normalize(d[i]) for i in range(len(d))], ignore_index=True)
    df['countries'] = df['countries'].str.join(' ')
    

        fruit   id  countries   color
    0   orange  1   Portugal    Orange
    1   apple   2   Portugal    red
    

    to save it as .xlsx file, use:

    df.to_excel('filename.xlsx', index=False)
    

    EDIT:

    json_normalize is a function that normalizes semi-structured JSON data into a flat table.

    I actually realised now that my code could be simplified to:

    df = json_normalize(d) # no need for `pd.concat`
    
    ### Output:
    #   fruit   id  countries   color
    # 0 orange  1   ['Portugal']    Orange
    # 1 apple   2   ['Portugal']    red
    

    To remove [] from countries column, I used pandas.Series.str.join, which is pandas' equivalent to Python's str.join.

    It's needed since initially countries column is a list containing elements

    df['countries'] = df['countries'].str.join(' ')
    

    countries column is no longer a list once you join items:

        fruit   id  countries   color
    0   orange  1   Portugal    Orange
    1   apple   2   Portugal    red