Search code examples
pythonjsonpandasxls

export certain elements from JSON to XLS


I currently read a JSON data structure from an API. This structure is kind of nested:

my_json={
"data": {
    "allSites": {
        "activeLicenses": 0,
        "totalLicenses": 1100
    },
    "sites": [
        {
            "accountId": "12345",
            "accountName": "ACME INC",
            "activeLicenses": 0,
            "createdAt": "2021-01-12T20:04:12.166693Z",
            "creator": null,
            "creatorId": null,
            "expiration": null,
            "externalId": null,
            "healthStatus": true,
            "id": "12345",
            "isDefault": true,
            "name": "Default site",
            "registrationToken": "rznwzrsnbwrn==",
            "siteType": "Paid",
            "sku": "Core",
            "state": "active",
            "suite": "Core",
            "totalLicenses": 0,
            "unlimitedExpiration": true,
            "unlimitedLicenses": true,
            "updatedAt": "2021-01-12T20:04:12.165504Z"
        },
        {
            "accountId": "67890",
            "accountName": "DE | C12345 | ACME Inc",
            "activeLicenses": 0,
            "createdAt": "2021-01-15T12:53:05.363922Z",
            "creator": "John Doe",
            "creatorId": "567837837",
            "expiration": "2021-01-31T02:00:00Z",
            "externalId": "C12345",
            "healthStatus": true,
            "id": "3268726578",
            "isDefault": true,
            "name": "Default site",
            "registrationToken": "dghmntzme6umeum==",
            "siteType": "Paid",
            "sku": "Core",
            "state": "active",
            "suite": "Core",
            "totalLicenses": 1000,
            "unlimitedExpiration": false,
            "unlimitedLicenses": false,
            "updatedAt": "2021-01-15T12:53:05.878138Z"
        },
        {
            "accountId": "769i376586256",
            "accountName": "ACME Inc 2",
            "activeLicenses": 0,
            "createdAt": "2021-01-16T10:48:55.629903Z",
            "creator": "Jon Doe",
            "creatorId": "267267",
            "expiration": null,
            "externalId": null,
            "healthStatus": true,
            "id": "467267267",
            "isDefault": false,
            "name": "IT PoC",
            "registrationToken": "sthmetuzmstmwsu==",
            "siteType": "Trial",
            "sku": "Complete",
            "state": "active",
            "suite": "Complete",
            "totalLicenses": 100,
            "unlimitedExpiration": true,
            "unlimitedLicenses": false,
            "updatedAt": "2021-01-16T10:48:55.940332Z"
        }
    ]
},
"pagination": {
    "nextCursor": null,
    "totalItems": 3
}

}

I'm only interested in the elements within "sites", everything else can be ignored for the time being. In the corresponding XLS the Column Names should be the item names like accoutID, accountName, etc.

I'm able to get everything exported to CSV or XLSX with Pandas, but how to I limit the export to "sites" only and get the column headers?

thanks

---edit while trying below suggestions, I get keyword errors.

My code looks like:

 r = requests.get("https://URL/web/api/v2.1/sites?limit=999&sortBy=name&states=active", headers={"Authorization":token})
data = r.json()

print(data)

sites = data['allSites']['sites']
result = pd.DataFrame(sites)
print(result)

the result of "data" the content mentioned above.

when I now do

sites = data['allSites']['sites']
result = pd.DataFrame(sites)
print(result)

I get back an error:

Traceback (most recent call last): File "", line 1, in File "/Applications/PyCharm CE.app/Contents/plugins/python-ce/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile pydev_imports.execfile(filename, global_vars, local_vars) # execute the script File "/Applications/PyCharm CE.app/Contents/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "/Users/adieball/Dropbox/Multiverse/Programming/S1MSSPProvisioning/getStats.py", line 45, in main() File "/Users/adieball/Dropbox/Multiverse/Programming/S1MSSPProvisioning/getStats.py", line 29, in main sites = data['allSites']['sites'] KeyError: 'allSites'


Solution

  • You can get the desired data by using the following:

    sites= my_json['data']['sites']
    

    and create a dataframe with:

    res=pd.DataFrame(sites)
    

    Output will be:

    >>>print(res)
    
           accountId             accountName  ...  unlimitedLicenses                    updatedAt
    0          12345                ACME INC  ...               True  2021-01-12T20:04:12.165504Z
    1          67890  DE | C12345 | ACME Inc  ...              False  2021-01-15T12:53:05.878138Z
    2  769i376586256              ACME Inc 2  ...              False  2021-01-16T10:48:55.940332Z
    
    [3 rows x 21 columns]