python json python-3.x pandas json-normalize

How to change json data into dataframe

I need one help to convert json data into dataframe. Could you please help me how to do this?

Example:

JSON DATA

{
    "user_id": "vmani4",
    "password": "*****",
    "api_name": "KOL",
    "body": {
      "api_name": "KOL",
      "columns": [
        "kol_id",
        "jnj_id",
        "kol_full_nm",
        "thrc_cd"
      ],
      "filter": {
        "kol_id": "101152",
        "jnj_id": "7124166",
        "thrc_nm": "VIR"
        
      }
    }
}

Desirable output:

user_id     password       api_name     columns       filter     filter_value
vmani        ******         KOL          kol_id       kol_id       101152
                                         jnj_id       jnj_id       7124166
                                         kol_full_nm  thrc_nm      VIR
                                         thrc_cd

Solution

I'm not familiar with DataFrame but I tried my best to come up with the solution of you desired output in proper way.

Code

import pandas as pd
import json
import numpy as np

json_data = """ {
    "user_id": "vmani4",
    "password": "*****",
    "api_name": "KOL",
    "body": {
      "api_name": "KOL",
      "columns": [
        "kol_id",
        "jnj_id",
        "kol_full_nm",
        "thrc_cd"
      ],
      "filter": {
        "kol_id": "101152",
        "jnj_id": "7124166",
        "thrc_nm": "VIR"
        
      }
    }
}"""

python_data = json.loads(json_data)

filter = {}
list_for_filter = []
filter_value = {}
list_for_filter_value = []
first_level = {}
for_colums = {}

for x, y in python_data.items():
    if type(y) is dict:
        for j, k in y.items():
            if j == 'columns':
                for_colums[j] = k
            if type(k) is dict:
                for m, n in k.items():
                    list_for_filter.append(m)
                    list_for_filter_value.append(n)
        break
    first_level[x] = [y]

filter['filter'] = list_for_filter
filter_value['filter_value'] = list_for_filter_value

res = {**first_level, **for_colums, **filter, **filter_value}

df = pd.concat([pd.Series(v, name=k) for k, v in res.items()], axis=1)
print(df)

output

  user_id password api_name      columns   filter filter_value
0  vmani4    *****      KOL       kol_id   kol_id       101152
1     NaN      NaN      NaN       jnj_id   jnj_id      7124166
2     NaN      NaN      NaN  kol_full_nm  thrc_nm          VIR
3     NaN      NaN      NaN      thrc_cd      NaN          NaN

Let me give you short hand about my code first created a lot of lists and dicts the reason why I did so is that I saw in your desired output some columns that weren't actually in your code like filter_value.

I also loop trough the dict items in order to make another dict which will satisfy the desired output.

after of all because of the length of lists in the DataFrame where not equal that's why I used concat and series