Search code examples
pythonpandaspython-2.7dataframesklearn-pandas

Iterating through entire Pandas Dataframe using column and row as arguments


I have this empty pandas dataframe and a function value(x,y) which takes in 2 arguments, the row number and the column number of the point in the pandas dataframe. I was wondering if there is a simpler way to iterate through the entire empty dataframe using these arguments, using df.apply perhaps.

I know it is possible to go through each individual column and run df.apply on the separate columns, but is it possible to do it without running any loops or what so ever.

Essentially, I am looking for something like this which I can run on the entire dataframe

df_copy.apply(lambda x: myfunction(x.value, x.column))

However, x.column does not exist, so it there is another way to do it or am I doing something wrong

Thanks!


Solution

  • Yes, use name and index attributes of a series:

    df = pd.DataFrame(1, index = np.arange(10,51,10), columns = np.arange(5))
    

    Show input dataframe

        0  1  2  3  4
    10  1  1  1  1  1
    20  1  1  1  1  1
    30  1  1  1  1  1
    40  1  1  1  1  1
    50  1  1  1  1  1
    

    Let's define custom fuction and use rows as columns to do a calcuation.

    def f(x):
        #add row values to column values
        return x.name + x.index
    
    df.apply(f)
    

    Output:

         0   1   2   3   4
    10  10  11  12  13  14
    20  20  21  22  23  24
    30  30  31  32  33  34
    40  40  41  42  43  44
    50  50  51  52  53  54
    

    Note: apply is passing each column of the dataframe(which is a pd.Series) into the function f. Each series has an attribute name which is the column heading, and index, which is dataframe row index. So, function, f returns a calculated pd.Series for each column of the dataframe and is put back together as a dataframe.

    Answering question in comments, let's use strings:

    df = pd.DataFrame(1, index=['Ted','Bill','Ralph','John','Tim'], columns=['A','B','C','D','E'])
    
    def f(x):
        #Concatenate row values with column values
        return x.index + '_' + x.name
    
    df.apply(f)
    

    OR use lambda function

    df.apply(lambda x: x.index + '_' + x.name)
    

    Output:

                 A        B        C        D        E
    Ted      Ted_A    Ted_B    Ted_C    Ted_D    Ted_E
    Bill    Bill_A   Bill_B   Bill_C   Bill_D   Bill_E
    Ralph  Ralph_A  Ralph_B  Ralph_C  Ralph_D  Ralph_E
    John    John_A   John_B   John_C   John_D   John_E
    Tim      Tim_A    Tim_B    Tim_C    Tim_D    Tim_E