Search code examples
pythonpandasdataframegroup-bydata-transform

Transforming Data from columns to Row Based on a particular Column and group in Pandas


Here is the sample Data

data = {'Game':  ['NFS', 'Forza', 'Wreckfest','Dirt Rally','Burnout','Project Cars','Grid 2','GTA','Saints Row','Persona 5','COD','Battlefield','Counter Strike'],
        'Game Tier' : ['Tier 1', 'Tier 2', 'Tier 2','Tier 2','Tier 3','Tier 3','Tier 4','Tier 1','Tier 3','Tier 2','Tier 1','Tier 1','Tier 2'],
        'Genre' : ['Racing', 'Racing', 'Racing','Racing','Racing','Racing','Racing','Open World','Open World','RPG','Shooter','Shooter','Shooter']}

df = pd.DataFrame (data, columns = ['Game', 'Game Tier', 'Genre']) 

df 

Data frame Generated with this data

    Game           Game_Tier  Genre
0   NFS            Tier 1     Racing
1   Forza          Tier 2     Racing
2   Wreckfest      Tier 2     Racing
3   Dirt Rally     Tier 2     Racing
4   Burnout        Tier 3     Racing
5   Project Cars   Tier 3     Racing
6   Grid 2         Tier 4     Racing
7   GTA            Tier 1     Open World
8   Saints Row     Tier 3     Open World
9   Persona 5      Tier 2     RPG
10  COD            Tier 1     Shooter
11  Battlefield    Tier 1     Shooter
12  Counter Strike Tier 2     Shooter

The Initial data can be in upto 20k count. With Each genre having less then 100 games.

I Want to transform this data such that Each game in the Tier is mapped to Next tier level for a particular Genre

Tier level can be upto level 5 in the initial data frame. The final Output should have Columns with 5 Tier Game Level even if there is no data.

Final Output should be like this

    Tier_1_Game  Tier_2_Game  Tier_3_Game  Tier_4_Game  Tier_5_Game  Genre
0   NFS          Forza        Burnout      Grid 2                    Racing
1   NFS          Wreckfest    Burnout      Grid 2                    Racing
2   NFS          Dirt Rally   Burnout      Grid 2                    Racing
3   NFS          Forza        Project Cars Grid 2                    Racing
4   NFS          Wreckfest    Project Cars Grid 2                    Racing
5   NFS          Dirt Rally   Project Cars Grid 2                    Racing
6   GTA                       Saints Row                             Open World
7                Persona 5                                           RPG
8   COD          Counter Strike                                      Shooter
9   Battlefield  Counter Strike                                      Shooter

Explanation On how we reach This final Data :
First We consider Genre 'Racing' And select Tier 1 games in it we get NFS we map it below

    Tier_1_Game Tier_2_Game Tier_3 Game Tier_4_Game Tier_5_Game Genre
0   NFS                                                         Racing

Now Next we have 3 Games on Tier 2 in Racing Genre We map it with NFS like below

    Tier_1_Game Tier_2_Game Tier_3_Game Tier_4_Game Tier_5_Game Genre
0   NFS         Forza                                           Racing
1   NFS         Wreckfest                                       Racing
2   NFS         Dirt Rally                                      Racing

Next At Tier 3 we have 2 games So we Create 2 entries of already Existing data

    Tier_1_Game Tier_2_Game Tier_3_Game Tier_4_Game Tier_5_Game Genre
0   NFS         Forza       Burnout                             Racing
1   NFS         Wreckfest   Burnout                             Racing
2   NFS         Dirt Rally  Burnout                             Racing
3   NFS         Forza       Project Cars                        Racing
4   NFS         Wreckfest   Project Cars                        Racing
5   NFS         Dirt Rally  Project Cars                        Racing

Now at Tier 4 we have only 1 Game Grid 2 so we put it in Existing records

    Tier_1_Game Tier_2_Game Tier_3_Game  Tier_4_Game    Tier_5_Game Genre
0   NFS         Forza       Burnout      Grid 2                     Racing
1   NFS         Wreckfest   Burnout      Grid 2                     Racing
2   NFS         Dirt Rally  Burnout      Grid 2                     Racing
3   NFS         Forza       Project Card Grid 2                     Racing
4   NFS         Wreckfest   Project Cars Grid 2                     Racing
5   NFS         Dirt Rally  Project Cars Grid 2                     Racing

Suppose In the Initial Data there is another Game
'The Crew' At Tier 4 in Racing Genre then The Final Out Would become like this.

    Tier_1_Game Tier_2_Game Tier_3_Game  Tier_4_Game    Tier_5_Game Genre
0   NFS         Forza       Burnout      Grid 2                     Racing
1   NFS         Wreckfest   Burnout      Grid 2                     Racing
2   NFS         Dirt Rally  Burnout      Grid 2                     Racing
3   NFS         Forza       Project Cars Grid 2                     Racing
4   NFS         Wreckfest   Project Cars Grid 2                     Racing
5   NFS         Dirt Rally  Project Cars Grid 2                     Racing
6   NFS         Forza       Burnout      The Crew                   Racing
7   NFS         Wreckfest   Burnout      The Crew                   Racing
8   NFS         Dirt Rally  Burnout      The Crew                   Racing
9   NFS         Forza       Project Cars The Crew                   Racing
10  NFS         Wreckfest   Project Cars The Crew                   Racing
11  NFS         Dirt Rally  Project Cars The Crew                   Racing

Now Since we have added all the existing tiers in Racing Genre We go to Next Genre and add that and so on.

Any ideas on how this can be implemented
Environment is Python 3.6 , Pandas 1.0.1


Solution

  • Code

    cols = ['Tier 1', 'Tier 2', 'Tier 3', 'Tier 4', 'Tier 5']
    def get_tier(df):
        dfs = {_: d['Game'] for _, d in df.groupby('Game Tier')}
        return (pd.MultiIndex.from_product(dfs.values(), names=dfs.keys())
                .to_frame(index=False).reindex(cols, axis=1)
                .add_suffix(' Game')
                .assign(Genre=df['Genre'].iloc[0])
               )
    
    out = (df.groupby('Genre', group_keys=False, sort=False)
           .apply(get_tier).reset_index(drop=True)
    )
    

    out

    enter image description here