Search code examples
pythonpandasdataframerunning-count

Manipulation of a dataframe index on the basis of values from another column


Suppose I have a dataframe which currently has data like this:

   T week
0  T-1
1  T-1
2  T-1
3  T-1
4  T-2
5  T-2
6  T-2
7  T-3
8  T-3
9  T-3
10 T-3

I want to group the index in such a way that it corresponds with the T- group I am dealing with, for example this is the dataframe I want:

   T week
1  T-1
2  T-1
3  T-1
4  T-1
1  T-2
2  T-2
3  T-2
1  T-3
2  T-3
3  T-3
4  T-3

Note how the index starts from 1 again (instead of 0) when there is a new T-group.

I tried to code this but it didn't really work. Could use some help!

import os,xlrd,pandas as pd

df = pd.read_excel(r'dir\file.xlsx')
book = xlrd.open_workbook(r'dir\file.xlsx')
sheet = book.sheet_by_name('Sheet1')

t_value = None
next_t = None
tabcount = 0
idx = 1
i = 1

while i!=sheet.nrows:
    t_value = df['T Week'][i]
    next_t = df['T Week'][i+1]
    if t_value == next_t:
        tabcount+=1
        df.at[i,'Num'] = idx
        idx+=1
    else:
        idx = 0
        df.at[i, 'Num'] = idx
    i+=1

Solution

  • Use groupby and cumcount. We'll all use add to adjust the cumcount by 1:

    df.index = df.groupby('T week').cumcount().add(1)
    

    out]

      T week
    1    T-1
    2    T-1
    3    T-1
    4    T-1
    1    T-2
    2    T-2
    3    T-2
    1    T-3
    2    T-3
    3    T-3
    4    T-3