Search code examples
pythonpandastrial

Edit pandas dataframe row-by-row


pandas for python is neat. I'm trying to replace a list-of-dictionaries with a pandas-dataframe. However, I'm wondering of there's a way to change values row-by-row in a for-loop just as easy?

Here's the non-pandas dict-version:

trialList = [
    {'no':1, 'condition':2, 'response':''},
    {'no':2, 'condition':1, 'response':''},
    {'no':3, 'condition':1, 'response':''}
]  # ... and so on

for trial in trialList:
    # Do something and collect response
    trial['response'] = 'the answer!'

... and now trialList contains the updated values because trial refers back to that. Very handy! But the list-of-dicts is very unhandy, especially because I'd like to be able to compute stuff column-wise which pandas excel at.

So given trialList from above, I though I could make it even better by doing something pandas-like:

import pandas as pd    
dfTrials = pd.DataFrame(trialList)  # makes a nice 3-column dataframe with 3 rows

for trial in dfTrials.iterrows():
   # do something and collect response
   trials[1]['response'] = 'the answer!'

... but trialList remains unchanged here. Is there an easy way to update values row-by-row, perhaps equivalent to the dict-version? It is important that it's row-by-row as this is for an experiment where participants are presented with a lot of trials and various data is collected on each single trial.


Solution

  • If you really want row-by-row ops, you could use iterrows and loc:

    >>> for i, trial in dfTrials.iterrows():
    ...     dfTrials.loc[i, "response"] = "answer {}".format(trial["no"])
    ...     
    >>> dfTrials
       condition  no  response
    0          2   1  answer 1
    1          1   2  answer 2
    2          1   3  answer 3
    
    [3 rows x 3 columns]
    

    Better though is when you can vectorize:

    >>> dfTrials["response 2"] = dfTrials["condition"] + dfTrials["no"]
    >>> dfTrials
       condition  no  response  response 2
    0          2   1  answer 1           3
    1          1   2  answer 2           3
    2          1   3  answer 3           4
    
    [3 rows x 4 columns]
    

    And there's always apply:

    >>> def f(row):
    ...     return "c{}n{}".format(row["condition"], row["no"])
    ... 
    >>> dfTrials["r3"] = dfTrials.apply(f, axis=1)
    >>> dfTrials
       condition  no  response  response 2    r3
    0          2   1  answer 1           3  c2n1
    1          1   2  answer 2           3  c1n2
    2          1   3  answer 3           4  c1n3
    
    [3 rows x 5 columns]