Search code examples
pythonpandasreindex

how to solve 'Reindexing only valid with uniquely valued Index objects' error


I have a dataframe which kind of looks like this:

           date        holiday  item_cnt_day    shop_id      cnt_sem    cnt_mes     cnt_year
0        2013-01-01       1         0.0           59         0.000000   0.000000    0.000000
1        2013-01-02       1         0.0           59         0.000000   0.000000    0.000000
2        2013-01-03       1         0.0           59         0.000000   0.000000    0.000000
3        2013-01-04       1         0.0           59         0.000000   0.000000    0.000000
4        2013-01-05       0         0.0           59         0.000000   0.000000    0.000000
          ......         ...        ...           ...           ...        ...         ...
1029    2015-10-27        0         4.0           36         1.142857   0.321429    0.024658
1030    2015-10-28        0         1.0           36         1.285714   0.357143    0.027397
1031    2015-10-29        0         1.0           36         1.142857   0.392857    0.030137
1032    2015-10-30        0         4.0           36         1.714286   0.535714    0.041096
1033    2015-10-31        0         1.0           36         1.857143   0.571429    0.043836

dates start from 2013-01-01 to 2015-10-31 and this range of dates is for every shop_id, this is, for every shop_id I have that range for the dates, therefore, I have duplicate dates, but what I'm trying to do is to only have those dates which comes after the first 365 days per shop_id and I'm trying to do that by using this function:

def no_todos(df, shops):
    # shops is a list of shops and there are 60 shops in this list
    # df is the dataframe to be operated in the loop

    new_df = pd.DataFrame(df)

    # Here I'm trying to only keep those observations which come after the first 365 days for each shop
    for t in shops:
        new_df['shop_id'][t] = df[365::]
    return new_df

However, I'm having this error: Reindexing only valid with uniquely valued Index objects. Does anyone knows how to solve this? Thanks in advance.


Solution

  • First sort the dataframe, then do a groupby and then do negative tail. Negative tail is not implemented in groupby tail method so you need to create your own function. This will skip the first rows of each group

    df.sort_values(['shop_id', 'date'], ascending=[True, True])
    
    def negative_tail(group, n):
        return group._selected_obj[group.cumcount(ascending=True) >= n]
    
    final_result = negative_tail(df.groupby('shop_id'), 365).copy()