Search code examples
pythonjsonpython-3.xpandasjson-normalize

How to change json data into dataframe


I need one help to convert json data into dataframe. Could you please help me how to do this?

Example:

JSON DATA

{
    "user_id": "vmani4",
    "password": "*****",
    "api_name": "KOL",
    "body": {
      "api_name": "KOL",
      "columns": [
        "kol_id",
        "jnj_id",
        "kol_full_nm",
        "thrc_cd"
      ],
      "filter": {
        "kol_id": "101152",
        "jnj_id": "7124166",
        "thrc_nm": "VIR"
        
      }
    }
}

Desirable output:

user_id     password       api_name     columns       filter     filter_value
vmani        ******         KOL          kol_id       kol_id       101152
                                         jnj_id       jnj_id       7124166
                                         kol_full_nm  thrc_nm      VIR
                                         thrc_cd

Solution

  • I'm not familiar with DataFrame but I tried my best to come up with the solution of you desired output in proper way.

    Code

    import pandas as pd
    import json
    import numpy as np
    
    json_data = """ {
        "user_id": "vmani4",
        "password": "*****",
        "api_name": "KOL",
        "body": {
          "api_name": "KOL",
          "columns": [
            "kol_id",
            "jnj_id",
            "kol_full_nm",
            "thrc_cd"
          ],
          "filter": {
            "kol_id": "101152",
            "jnj_id": "7124166",
            "thrc_nm": "VIR"
            
          }
        }
    }"""
    
    python_data = json.loads(json_data)
    
    filter = {}
    list_for_filter = []
    filter_value = {}
    list_for_filter_value = []
    first_level = {}
    for_colums = {}
    
    for x, y in python_data.items():
        if type(y) is dict:
            for j, k in y.items():
                if j == 'columns':
                    for_colums[j] = k
                if type(k) is dict:
                    for m, n in k.items():
                        list_for_filter.append(m)
                        list_for_filter_value.append(n)
            break
        first_level[x] = [y]
    
    filter['filter'] = list_for_filter
    filter_value['filter_value'] = list_for_filter_value
    
    res = {**first_level, **for_colums, **filter, **filter_value}
    
    df = pd.concat([pd.Series(v, name=k) for k, v in res.items()], axis=1)
    print(df)
    
    

    output

      user_id password api_name      columns   filter filter_value
    0  vmani4    *****      KOL       kol_id   kol_id       101152
    1     NaN      NaN      NaN       jnj_id   jnj_id      7124166
    2     NaN      NaN      NaN  kol_full_nm  thrc_nm          VIR
    3     NaN      NaN      NaN      thrc_cd      NaN          NaN
    

    Let me give you short hand about my code first created a lot of lists and dicts the reason why I did so is that I saw in your desired output some columns that weren't actually in your code like filter_value.

    I also loop trough the dict items in order to make another dict which will satisfy the desired output.

    after of all because of the length of lists in the DataFrame where not equal that's why I used concat and series