Search code examples
pythonjsonpandasjson-normalize

json_normalize in Python


I am trying here to use json_normalize to somehow format the output of an API, but I keep getting a faulty and empty csv file. I tried to change df2 = pd.json_normalize(response, record_path=['LIST']) , but keep getting this error message:

TypeError: byte indices must be integers or slices, not str

Could you please guide me on what am I doing wrong ?

Thanks a lot !

import requests
import json
import pandas as pd

url = "https://*hidden*Results/"

payload = json.dumps({
  "id": 12345
})
headers = {
  'Authorization': 'Basic *hidden*',
  'Content-Type': 'application/json'
}

response = requests.request("POST", url, headers=headers, data=payload)


df1 = pd.DataFrame(response).iloc[:,:-2]
df2 = pd.json_normalize(response, record_path=None)

df = pd.concat([df1, df2], axis=1)
df.to_csv("test.csv", index=False)

Solution

  • You are passing the variable response in the call:

    df2 = pd.json_normalize(response, record_path=None)

    Which is an a requests.models.Response Object and you need to pass a dict, so you need to do something like pd.json_normalize(response.json(), record_path=['LIST'])

    I tried it with this example and works:

    >>> import pandas as pd
    >>> data = [
    ...     {
    ...         "state": "Florida",
    ...         "shortname": "FL",
    ...         "info": {"governor": "Rick Scott"},
    ...         "counties": [
    ...             {"name": "Dade", "population": 12345},
    ...             {"name": "Broward", "population": 40000},
    ...             {"name": "Palm Beach", "population": 60000},
    ...         ],
    ...     },
    ...     {
    ...         "state": "Ohio",
    ...         "shortname": "OH",
    ...         "info": {"governor": "John Kasich"},
    ...         "counties": [
    ...             {"name": "Summit", "population": 1234},
    ...             {"name": "Cuyahoga", "population": 1337},
    ...         ],
    ...     },
    ... ]
    >>> result = pd.json_normalize(data, ["counties"])
    >>> result
             name  population
    0        Dade       12345
    1     Broward       40000
    2  Palm Beach       60000
    3      Summit        1234
    4    Cuyahoga        1337
    

    EDIT I will try to do this:

    import requests
    import json
    import pandas as pd
    
    url = "https://*hidden*Results/"
    
    payload = json.dumps({
      "id": 12345
    })
    headers = {
      'Authorization': 'Basic *hidden*',
      'Content-Type': 'application/json'
    }
    
    response = requests.request("POST", url, headers=headers, data=payload)
    
    json_response = response.json()
    
    df1 = pd.DataFrame(json_response).iloc[:,:-2]
    df2 = pd.json_normalize(json_response, record_path=['LIST'])
    
    df = pd.concat([df1, df2], axis=1)
    df.to_csv("test.csv", index=False)