Search code examples
pythonpandasdataframeloopsisinstance

Iterate through data frame


My code pulls a dataframe object and I'd like to mask the dataframe. If a value <= 15 then change value to 1 else change value to 0.

import pandas as pd
XTrain = pd.read_excel('C:\\blahblahblah.xlsx')

for each in XTrain:
  if each <= 15:
    each = 1
  else:
    each = 0

Im coming from VBA and .NET so I know it's not very pythonic, but it seems super easy to me... The code hits an error since it iterates through the df header. So I tried to check for type

for each in XTrain:
  if isinstance(each, str) is False:
    if each <= 15:
      each = 1
    else:
      each = 0

This time it got to the final header but did not progress into the dataframe. This makes me think I am not looping through thr dataframe correctly? Been stumped for hours, could anyone send me a little help?

Thank you!


Solution

  • for each in XTrain always loops through the column names only. That's how Pandas designs it to be.

    Pandas allows comparison/ arithmetic operations with numbers directly. So you want:

     # le is less than or equal to
     XTrains.le(15).astype(int)
    
     # same as
     # (XTrain <= 15).astype(int)
    

    If you really want to iterate (don't), remember that a dataframe is two dimensional. So something like this:

    for index, row in df.iterrows():
        for cell in row:
            if cell <= 15:
                # do something
                # cell = 1 might not modify the cell in original dataframe
                # this is a python thing and you will get used to it
            else:
                # do something else