Search code examples
pythonspyder

How to use iloc to get reference from the rows above


I have a dataframe that looks like this:

print(df)
Out[1]: 
    Numbers
0         0
1         1
2         1
3         1
4         1
5         0
6         0
7         1
8         0
9         1
10        1

I want to transform it to this:

print(dfo)
Out[2]: 
    Numbers
0         0
1         1
2         2
3         3
4         4
5         0
6         0
7         1
8         0
9         1
10        2

The solution to this,I thought, it could be an iloc with 2 ifs: Check if the digit in df is 1, if1 true then check if2 the i-1 is 1, if true then in dfo see the value of i-1 and add 1,elifs just put the value of 0 in dfo.

I've tryed this:

# Import pandas library
import pandas as pd
# initialize list elements
list = [0,1,1,1,1,0,0,1,0,1,1]
# Create the pandas DataFrame with column name is provided explicitly
df = pd.DataFrame(list, columns=['Numbers'])

# print dataframe.
df

data1c = df.copy()

for j in df:
    for i in range(len(df)):
        if df.loc[i, j] == 1:
            if df.loc[i-1, j] == 1:
                data1c.loc[i, j] = data1c.loc[i-1, j]+1
            elif df.loc[i-1, j] == 0:
                data1c.loc[i, j] = 1
            
            
        elif df.loc[i, j] == 0:
            data1c.loc[i, j] = 0

print(data1c)
    Numbers
0         0
1         1
2         2
3         3
4         4
5         0
6         0
7         1
8         0
9         1
10        2

and for a dataframe of 1 column it works, but when I've tryed with a dataframe with 2 columns :

input = {'A': [0,1,1,1,1,0,0,1,0,1,1,0,1,1],
         'B':  [1,1,0,0,1,1,1,0,0,1,0,1,1,0]}

df = pd.DataFrame(input)

# Print the output.
df

data2c = df.copy()

for j in dfo:
    for i in range(len(dfo)):
        if dfo.loc[i, j] == 1:
            if dfo.loc[i-1, j] == 1:
                data2c.loc[i, j] = data2c.loc[i-1, j]+1
            elif dfo.loc[i-1, j] == 0:
                data2c.loc[i, j] = 1
        elif dfo.loc[i, j] == 0:
            data2c.loc[i, j] = 0

I get :

File "C:\Users\talls\.conda\envs\Spyder\lib\site-packages\pandas\core\indexes\range.py", line 393, in get_loc
    raise KeyError(key) from err

KeyError: -1

Why do I get this error and how do I fix it? or Is there another way to get my desired out put?


Solution

  • I know this is not the answer to the question "How to use iloc to get reference from the rows above?", but it is the answer to your proposed question.

    df = pd.DataFrame([0,1,1,1,1,0,0,1,0,1,1], columns=['Numbers'])
    df['tmp'] = (~(df['Numbers'] == df['Numbers'].shift(1))).cumsum()
    df['new'] = df.groupby('tmp')['Numbers'].cumsum()
    print(df['new'])
    
        Numbers  tmp  new
    0         0    1    0
    1         1    2    1
    2         1    2    2
    3         1    2    3
    4         1    2    4
    5         0    3    0
    6         0    3    0
    7         1    4    1
    8         0    5    0
    9         1    6    1
    10        1    6    2
    

    How does this work? The inner part ~(df['Numbers'] == df['Numbers'].shift(1)) checks whether the previous line is the same as the current line. For the first line, this works perfectly as well, because a number and a NaN always compare to False. Then I negate it mark the start of each new sequence with a True. When I then do a cumulative sum, I "group" all values with the created tmp column and do a cumulative sum over it to get the required answer in the new column.

    For the two columned version, you'd do exactly the same... for both columns A and B:

    df = pd.DataFrame(input)
    df['tmp'] = (~(df['A'] == df['A'].shift(1))).cumsum()
    df['newA'] = df.groupby('tmp')['A'].cumsum()
    #Just reusing column tmp
    df['tmp'] = (~(df['B'] == df['B'].shift(1))).cumsum()
    df['newB'] = df.groupby('tmp')['B'].cumsum()
    print(df)
    
        A  B  tmp  newA  newB
    0   0  1    1     0     1
    1   1  1    1     1     2
    2   1  0    2     2     0
    3   1  0    2     3     0
    4   1  1    3     4     1
    5   0  1    3     0     2
    6   0  1    3     0     3
    7   1  0    4     1     0
    8   0  0    4     0     0
    9   1  1    5     1     1
    10  1  0    6     2     0
    11  0  1    7     0     1
    12  1  1    7     1     2
    13  1  0    8     2     0