Search code examples
pythonpandasdataframetypeerror

TypeError: cannot do positional indexing on Int64Index with these indexers [Int64Index([5], dtype='int64')] of type Int64Index


I have a dataframe (small sample) like this:

import pandas as pd

data = [['A', False, 2], ['A', True, 8], ['A', False, 25], ['A', False, 30], ['B', False, 4], ['B', False, 8], ['B', True, 2], ['B', False, 3]]
df = pd.DataFrame(data = data, columns = ['group', 'indicator', 'val'])

  group  indicator  val
0     A      False    2
1     A       True    8
2     A      False   25
3     A      False   30
4     B      False    4
5     B      False    8
6     B       True    2
7     B      False    3

I would like to select n rows above and below the row with indicator == True for each group. For example I would like to get n = 1 rows which means that for group A it would return the rows with index: 0, 1, 2 and for group B rows with index: 5, 6, 7. I tried the following code:

# subset each group to list
dfs = [x for _, x in df.groupby('group')] 

for i in dfs:
    # select dataframe
    df_sub = dfs[1]
    # get index of row with indicator True
    idx = df_sub.index[df_sub['indicator'] == True]
    # select n rows above and below row with True
    df_sub = df_sub.iloc[idx - 1: idx + 1]
    # combine each dataframe again
    df_merged = pd.concat(df_sub)
    
    print(df_merged)

But I get the following error:

TypeError: cannot do positional indexing on Int64Index with these indexers [Int64Index([5], dtype='int64')] of type Int64Index

This is the desired output:

data = [['A', False, 2], ['A', True, 8], ['A', False, 25], ['B', False, 8], ['B', True, 2], ['B', False, 3]]
df_desired = pd.DataFrame(data = data, columns = ['group', 'indicator', 'val'])

  group  indicator  val
0     A      False    2
1     A       True    8
2     A      False   25
3     B      False    8
4     B       True    2
5     B      False    3

I don't understand why this error happens and how to solve it. Does anyone know how to fix this issue?


Solution

  • You can use a groupby.rolling with a centered window of 2*n+1 to get the n rows before and after each True, then perform boolean indexing:

    n = 1
    
    mask = (df.groupby('group')['indicator']
              .rolling(n*2+1, center=True, min_periods=1)
              .max().droplevel(0)
              .astype(bool)
           )
    
    out = df.loc[mask]
    

    output:

      group  indicator  val
    0     A      False    2
    1     A       True    8
    2     A      False   25
    5     B      False    8
    6     B       True    2
    7     B      False    3