Here is the sample Data
data = {'Game': ['NFS', 'Forza', 'Wreckfest','Dirt Rally','Burnout','Project Cars','Grid 2','GTA','Saints Row','Persona 5','COD','Battlefield','Counter Strike'],
'Game Tier' : ['Tier 1', 'Tier 2', 'Tier 2','Tier 2','Tier 3','Tier 3','Tier 4','Tier 1','Tier 3','Tier 2','Tier 1','Tier 1','Tier 2'],
'Genre' : ['Racing', 'Racing', 'Racing','Racing','Racing','Racing','Racing','Open World','Open World','RPG','Shooter','Shooter','Shooter']}
df = pd.DataFrame (data, columns = ['Game', 'Game Tier', 'Genre'])
df
Data frame Generated with this data
Game Game_Tier Genre
0 NFS Tier 1 Racing
1 Forza Tier 2 Racing
2 Wreckfest Tier 2 Racing
3 Dirt Rally Tier 2 Racing
4 Burnout Tier 3 Racing
5 Project Cars Tier 3 Racing
6 Grid 2 Tier 4 Racing
7 GTA Tier 1 Open World
8 Saints Row Tier 3 Open World
9 Persona 5 Tier 2 RPG
10 COD Tier 1 Shooter
11 Battlefield Tier 1 Shooter
12 Counter Strike Tier 2 Shooter
The Initial data can be in upto 20k count. With Each genre having less then 100 games.
I Want to transform this data such that Each game in the Tier is mapped to Next tier level for a particular Genre
Tier level can be upto level 5 in the initial data frame. The final Output should have Columns with 5 Tier Game Level even if there is no data.
Final Output should be like this
Tier_1_Game Tier_2_Game Tier_3_Game Tier_4_Game Tier_5_Game Genre
0 NFS Forza Burnout Grid 2 Racing
1 NFS Wreckfest Burnout Grid 2 Racing
2 NFS Dirt Rally Burnout Grid 2 Racing
3 NFS Forza Project Cars Grid 2 Racing
4 NFS Wreckfest Project Cars Grid 2 Racing
5 NFS Dirt Rally Project Cars Grid 2 Racing
6 GTA Saints Row Open World
7 Persona 5 RPG
8 COD Counter Strike Shooter
9 Battlefield Counter Strike Shooter
Explanation On how we reach This final Data :
First We consider Genre 'Racing' And select Tier 1 games in it we get NFS we map it below
Tier_1_Game Tier_2_Game Tier_3 Game Tier_4_Game Tier_5_Game Genre
0 NFS Racing
Now Next we have 3 Games on Tier 2 in Racing Genre We map it with NFS like below
Tier_1_Game Tier_2_Game Tier_3_Game Tier_4_Game Tier_5_Game Genre
0 NFS Forza Racing
1 NFS Wreckfest Racing
2 NFS Dirt Rally Racing
Next At Tier 3 we have 2 games So we Create 2 entries of already Existing data
Tier_1_Game Tier_2_Game Tier_3_Game Tier_4_Game Tier_5_Game Genre
0 NFS Forza Burnout Racing
1 NFS Wreckfest Burnout Racing
2 NFS Dirt Rally Burnout Racing
3 NFS Forza Project Cars Racing
4 NFS Wreckfest Project Cars Racing
5 NFS Dirt Rally Project Cars Racing
Now at Tier 4 we have only 1 Game Grid 2 so we put it in Existing records
Tier_1_Game Tier_2_Game Tier_3_Game Tier_4_Game Tier_5_Game Genre
0 NFS Forza Burnout Grid 2 Racing
1 NFS Wreckfest Burnout Grid 2 Racing
2 NFS Dirt Rally Burnout Grid 2 Racing
3 NFS Forza Project Card Grid 2 Racing
4 NFS Wreckfest Project Cars Grid 2 Racing
5 NFS Dirt Rally Project Cars Grid 2 Racing
Suppose In the Initial Data there is another Game
'The Crew' At Tier 4 in Racing Genre then The Final Out Would become like this.
Tier_1_Game Tier_2_Game Tier_3_Game Tier_4_Game Tier_5_Game Genre
0 NFS Forza Burnout Grid 2 Racing
1 NFS Wreckfest Burnout Grid 2 Racing
2 NFS Dirt Rally Burnout Grid 2 Racing
3 NFS Forza Project Cars Grid 2 Racing
4 NFS Wreckfest Project Cars Grid 2 Racing
5 NFS Dirt Rally Project Cars Grid 2 Racing
6 NFS Forza Burnout The Crew Racing
7 NFS Wreckfest Burnout The Crew Racing
8 NFS Dirt Rally Burnout The Crew Racing
9 NFS Forza Project Cars The Crew Racing
10 NFS Wreckfest Project Cars The Crew Racing
11 NFS Dirt Rally Project Cars The Crew Racing
Now Since we have added all the existing tiers in Racing Genre We go to Next Genre and add that and so on.
Any ideas on how this can be implemented
Environment is Python 3.6 , Pandas 1.0.1
Code
cols = ['Tier 1', 'Tier 2', 'Tier 3', 'Tier 4', 'Tier 5']
def get_tier(df):
dfs = {_: d['Game'] for _, d in df.groupby('Game Tier')}
return (pd.MultiIndex.from_product(dfs.values(), names=dfs.keys())
.to_frame(index=False).reindex(cols, axis=1)
.add_suffix(' Game')
.assign(Genre=df['Genre'].iloc[0])
)
out = (df.groupby('Genre', group_keys=False, sort=False)
.apply(get_tier).reset_index(drop=True)
)
out