Search code examples
pythonpandasdataframeseries

Python Pandas iterate over rows and access column names


I am trying to iterate over the rows of a Python Pandas dataframe. Within each row of the dataframe, I am trying to to refer to each value along a row by its column name.

Here is what I have:

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.rand(10,4),columns=list('ABCD'))
print df
          A         B         C         D
0  0.351741  0.186022  0.238705  0.081457
1  0.950817  0.665594  0.671151  0.730102
2  0.727996  0.442725  0.658816  0.003515
3  0.155604  0.567044  0.943466  0.666576
4  0.056922  0.751562  0.135624  0.597252
5  0.577770  0.995546  0.984923  0.123392
6  0.121061  0.490894  0.134702  0.358296
7  0.895856  0.617628  0.722529  0.794110
8  0.611006  0.328815  0.395859  0.507364
9  0.616169  0.527488  0.186614  0.278792

I used this approach to iterate, but it is only giving me part of the solution - after selecting a row in each iteration, how do I access row elements by their column name?

Here is what I am trying to do:

for row in df.iterrows():
    print row.loc[0,'A']
    print row.A
    print row.index()

My understanding is that the row is a Pandas series. But I have no way to index into the Series.

Is it possible to use column names while simultaneously iterating over rows?


Solution

  • I also like itertuples()

    for row in df.itertuples():
        print(row.A)
        print(row.Index)
    

    since row is a named tuples, if you meant to access values on each row this should be MUCH faster

    speed run :

    df = pd.DataFrame([x for x in range(1000*1000)], columns=['A'])
    st=time.time()
    for index, row in df.iterrows():
        row.A
    print(time.time()-st)
    45.05799984931946
    
    st=time.time()
    for row in df.itertuples():
        row.A
    print(time.time() - st)
    0.48400020599365234