Search code examples
pythonpandasdataframereindex

Is there a way I can reindex rows in a dataframe that contains duplicates, such that the duplicates would also get reindexed?


I'm currently working on a project where I need to parse a dataframe that contains data of all the shots the New York Knicks have taken in the 2013-2014 season(~7000 rows). The first column is for the game_id, which is a unique identifier for each of the 82 games played for the season. The first 72 rows have the game_id set to 0021300008. The next 85 rows are for the next game, with the identifier as 0021300018, and so on. I want to reindex all of these rows so that the first game_id will be 1 instead, and the next one 2, etc. I tried looking at reindexing options for pandas but I can't seem to find a solution. Does anyone have any advice?

Thank you.


Solution

  • Use Series.rank.

    df['game_id'] = df['game_id'].rank(method='dense').astype(int)
    

    Another option is to create a dictionary that maps each unique 'game_id' to 1, 2, 3, etc, and pass it to Series.map

    # The game ids are mapped to 1, 2, 3, etc according to their order of appearance 
    # to do a rank-based mapping, use enumerate(df['game_id'].unique().sort_values())
    idx_map = {idx: n for n, idx in enumerate(df['game_id'].unique())}
    df['game_id'] = df['game_id'].map(idx_map)