Search code examples
jsonpandasmongodbcsvexport-to-csv

multiple object of an array creates different columns in the CSV file


Here is my JSON example. When I convert JSON to CSV file, it creates different columns for each object of reviews array. columns names be like - serial name.0 rating.0 _id.0 name.1 rating.1 _id.1. How can i convert to CSV file where only serial,name,rating,_id will be the column name and every object of the reviews will be put in a different row?

`

[{
    "serial":  "63708940a8d291c502be815f",
    "reviews": [
      {
        "name": "shadman",
        "rating": 4,
        "_id":"6373d4eb50cff661989f3d83"      
      },
      {
        "name": "niloy1",
        "rating": 3,
        "_id": "6373d59450cff661989f3db8"    
      }, 
    ],
  }]

`

`

I am trying to use the CSV file to pandas. If not possible, is there any way to solve the problem using pandas package in python?


Solution

  • I suggest you use pandas for the CSV export only and process the json data by flattening the data structure first so that the result can then be easily loaded in a Pandas DataFrame.

    Try:

    data_python = [{
        "serial":  "63708940a8d291c502be815f",
        "reviews": [
          {
            "name": "shadman",
            "rating": 4,
            "_id":"6373d4eb50cff661989f3d83"      
          },
          {
            "name": "niloy1",
            "rating": 3,
            "_id": "6373d59450cff661989f3db8"    
          }, 
        ],
     }]
    
    from collections import defaultdict
    from pprint import pprint
    import pandas as pd
    
    dct_flat = defaultdict(list)
    for dct in data_python:
        for dct_reviews in dct["reviews"]:
            dct_flat['serial'].append(dct['serial']) 
            for key, value in dct_reviews.items():
                dct_flat[key].append(value)
    
    #pprint(data_python)
    #pprint(dct_flat)
    df = pd.DataFrame(dct_flat)
    print(df) 
    df.to_csv("data.csv")
    
    

    which gives:

                         serial     name  rating                       _id
    0  63708940a8d291c502be815f  shadman       4  6373d4eb50cff661989f3d83
    1  63708940a8d291c502be815f   niloy1       3  6373d59450cff661989f3db8
    

    and

    ,serial,name,rating,_id
    0,63708940a8d291c502be815f,shadman,4,6373d4eb50cff661989f3d83
    1,63708940a8d291c502be815f,niloy1,3,6373d59450cff661989f3db8
    

    as CSV file content.

    Notice that the json you provided in your question can't be loaded from file or string in Python neither using Python json module nor using Pandas because it is not valid json code. See below for corrected valid json data:

    valid_json_data='''\
    [{
        "serial":  "63708940a8d291c502be815f",
        "reviews": [
          {
            "name": "shadman",
            "rating": 4,
            "_id":"6373d4eb50cff661989f3d83"      
          },
          {
            "name": "niloy1",
            "rating": 3,
            "_id": "6373d59450cff661989f3db8"    
          } 
        ]
    }]
    '''
    

    and code for loading this data from json file:

    import json
    json_file = "data.json"
    with open(json_file) as f:
        data_json = f.read()
        data_python = json.loads(data_json)