Search code examples
jsonpython-3.xpandasdataframepyspark

How convert CSV table structure to JSON using Python?


Today I have a challenge at my school, it is convert a CSV file to a JSON file.

This CSV has a table structure (which means contains the information from an oracle table in this example). So i have to convert this CSV file into a hierarchical JSON.

My CSV example contain the next information:

TABLE_NAME,TIPO_CARGA,COLUMN_NAME,DATA_TYPE
CUENTAS,Full load,PERFIL,NVARCHAR2
CUENTAS,Full load,LLAMADA_INI,DATE
CUENTAS,Full load,LLAMADA_END,DATE
CUENTAS_PLAN,Delta,REGISTRO,NVARCHAR2
CUENTAS_PLAN,Delta,ULTIMA_VISITA,DATE
CUENTAS_PLAN,Delta,FECHA,DATE
CUENTAS_TACTICAS,Delta,CIUDAD,NVARCHAR2
CUENTAS_TACTICAS,Delta,SALIDA,NVARCHAR2
CUENTAS_TACTICAS,Delta,COMPLETADO,DATE
CUENTAS_MIEMBROS,Full load,NEGOCIO,NVARCHAR2
CUENTAS_MIEMBROS,Full load,CORE,NVARCHAR2
CUENTAS_MIEMBROS,Full load,DESC,NVARCHAR2

and I start with python using some libraries like pandas and also json, the next code is this:

import pandas as pd
import json

df = pd.read_csv("list_tablas_short.csv")

print(df.head())
thisisjson = df.to_json(orient='records')

print('CSV to JSON:\n', thisisjson)

thisisjson_dict = json.loads(thisisjson)
with open('data.json', 'w') as json_file:
json.dump(thisisjson_dict, json_file)

And the result is:

[
  {
    "TABLE_NAME;\"COLUMN_NAME\";\"DATA_TYPE\"": "CUENTAS;PERFIL;NVARCHAR2"
  },
  {
    "TABLE_NAME;\"COLUMN_NAME\";\"DATA_TYPE\"": "CUENTAS;LLAMADA_INI;DATE"
  },
  {...}
]

This is nto the way i'm looking for.

The idea is get something like this:

{
  "tablas": [
    {
      "nombre": "CUENTAS",
      "load_type": "Full load"
      "fields": [
        {
          "campo": "PERFIL",
          "tipo": "NVARCHAR2"
        },
        {
          "campo": "LLAMADA_INI",
          "tipo": "DATE"
        },
        {
          "campo": "LLAMADA_END",
          "tipo": "DATE"
        }
      ]
    },
    {
      "nombre": "CUENTAS_PLAN",
      "load_type": "Delta"
      "fields": [
        {
          "campo": "REGISTRO",
          "tipo": "NVARCHAR2"
        },
        {
          "campo": "ULTIMA_VISITA",
          "tipo": "DATE"
        },
        {
          "campo": "FECHA",
          "tipo": "DATE"
        }
      ]
    }
  ]
}

Can somebody help me with this?


Solution

  • Using a dictionary comprehension with groupby and to_dict:

    import json
    
    thisisjson_dict = {'tablas':
                       [{'nombre': n, 'load_type': l,
                         'fields': g.to_dict('records')}
                        for (n, l), g in df.rename(columns={'COLUMN_NAME':'campo',
                                                       'DATA_TYPE':'tipo'})
                                      .set_index(['TABLE_NAME', 'TIPO_CARGA'])
                                      .groupby(level=[0, 1])
                       ]
                      }
    
    out = json.dumps(thisisjson_dict, indent=2)
    

    Variant:

    thisisjson_dict = {'tablas':
                       [{'nombre': n, 'load_type': l,
                         'fields': g.to_dict('records')}
                        for (n, l), g in df.rename(columns={'COLUMN_NAME':'campo',
                                                       'DATA_TYPE':'tipo'})
                                      .groupby(['TABLE_NAME', 'TIPO_CARGA'])
                                      [['campo', 'tipo']]
                       ]
                      }
    
    out = json.dumps(thisisjson_dict, indent=2)
    

    Output:

    {
      "tablas": [
        {
          "nombre": "CUENTAS",
          "load_type": "Full load",
          "fields": [
            {
              "campo": "PERFIL",
              "tipo": "NVARCHAR2"
            },
            {
              "campo": "LLAMADA_INI",
              "tipo": "DATE"
            },
            {
              "campo": "LLAMADA_END",
              "tipo": "DATE"
            }
          ]
        },
        {
          "nombre": "CUENTAS_MIEMBROS",
          "load_type": "Full load",
          "fields": [
            {
              "campo": "NEGOCIO",
              "tipo": "NVARCHAR2"
            },
            {
              "campo": "CORE",
              "tipo": "NVARCHAR2"
            },
            {
              "campo": "DESC",
              "tipo": "NVARCHAR2"
            }
          ]
        },
        {
          "nombre": "CUENTAS_PLAN",
          "load_type": "Delta",
          "fields": [
            {
              "campo": "REGISTRO",
              "tipo": "NVARCHAR2"
            },
            {
              "campo": "ULTIMA_VISITA",
              "tipo": "DATE"
            },
            {
              "campo": "FECHA",
              "tipo": "DATE"
            }
          ]
        },
        {
          "nombre": "CUENTAS_TACTICAS",
          "load_type": "Delta",
          "fields": [
            {
              "campo": "CIUDAD",
              "tipo": "NVARCHAR2"
            },
            {
              "campo": "SALIDA",
              "tipo": "NVARCHAR2"
            },
            {
              "campo": "COMPLETADO",
              "tipo": "DATE"
            }
          ]
        }
      ]
    }