Search code examples
pythonjsondataframedictionarynested

how to get a cross table from nested dictionaries in a json file


I would like to get a cross-table by using pandas dataframe with Python like this:

Home Away Tot
played 19 19 38
wins 10 8 18
draws 7 5 12
loss 2 6 8
goals_for 40 26 66
goals_against 17 19 36
avg_goal_for 2.1 1.4 1.7
avg_goal_against 0.9 1.0 0.9

and the json file is :

{
"get": "teams/statistics",
"response": {
    "fixtures": {
        "played": {
            "home": 19,
            "away": 19,
            "total": 38
        },
        "wins": {
            "home": 10,
            "away": 8,
            "total": 18
        },
        "draws": {
            "home": 7,
            "away": 5,
            "total": 12
        },
        "loses": {
            "home": 2,
            "away": 6,
            "total": 8
        }
    },
    "goals": {
        "for": {
            "total": {
                "home": 40,
                "away": 26,
                "total": 66
            },
            "average": {
                "home": "2.1",
                "away": "1.4",
                "total": "1.7"
            }
        },
        "against": {
            "total": {
                "home": 17,
                "away": 19,
                "total": 36
            },
            "average": {
                "home": "0.9",
                "away": "1.0",
                "total": "0.9"
            }
        }
    }

}

}

I don't know how to handle all nested dictionaries in an easy way. Please someone can help me out. Thanks


Solution

  • One way to do this is to process both halves of your response separately, using json_normalize to read the data, then splitting the column names to a multi-level index and then using stack to move to a long format from the wide format. You can then drop the top-level index and concat the two dataframes:

    data = json.loads(jstr)
    
    fixtures = pd.json_normalize(data['response']['fixtures'])
    fixtures.columns = fixtures.columns.str.split('.', expand=True)
    fixtures = fixtures.stack(level=0)
    fixtures.index = fixtures.index.droplevel()
    fixtures
    #         away  home  total
    # draws      5     7     12
    # loses      6     2      8
    # played    19    19     38
    # wins       8    10     18
    
    goals = pd.json_normalize(data['response']['goals'])
    goals.columns = goals.columns.str.split(r'\.(?!.*\.)', expand=True)
    goals = goals.stack(level=0)
    goals.index = goals.index.droplevel()
    goals
    #                 away home total
    # against.average  1.0  0.9   0.9
    # against.total     19   17    36
    # for.average      1.4  2.1   1.7
    # for.total         26   40    66
    
    out = pd.concat([fixtures, goals])
    

    Output:

                    away home total
    draws              5    7    12
    loses              6    2     8
    played            19   19    38
    wins               8   10    18
    against.average  1.0  0.9   0.9
    against.total     19   17    36
    for.average      1.4  2.1   1.7
    for.total         26   40    66
    

    Note, if desired you can rename the index column to match your question using

    out.index = ['draws', 'loses', 'played', 'wins', 'goals_against_average', 'goals_against_total', 'goals_for_average', 'goals_for_total']
    

    Output:

                          away home total
    draws                    5    7    12
    loses                    6    2     8
    played                  19   19    38
    wins                     8   10    18
    goals_against_average  1.0  0.9   0.9
    goals_against_total     19   17    36
    goals_for_average      1.4  2.1   1.7
    goals_for_total         26   40    66
    

    Note this code can be made more generic, by creating a dict of the sub-dictionary names and the appropriate splitting string, and then iterating that dict:

    parts = { 'fixtures' : '.', 'goals' : r'\.(?!.*\.)' }
    dfs = []
    for part in parts:
        df = pd.json_normalize(data['response'][part])
        df.columns = df.columns.str.split(parts[part], expand=True)
        df = df.stack(level=0)
        df.index = df.index.droplevel()
        dfs.append(df)
    
    out = pd.concat(dfs)