My code pulls a dataframe object and I'd like to mask the dataframe. If a value <= 15 then change value to 1 else change value to 0.
import pandas as pd
XTrain = pd.read_excel('C:\\blahblahblah.xlsx')
for each in XTrain:
if each <= 15:
each = 1
else:
each = 0
Im coming from VBA and .NET so I know it's not very pythonic, but it seems super easy to me... The code hits an error since it iterates through the df header. So I tried to check for type
for each in XTrain:
if isinstance(each, str) is False:
if each <= 15:
each = 1
else:
each = 0
This time it got to the final header but did not progress into the dataframe. This makes me think I am not looping through thr dataframe correctly? Been stumped for hours, could anyone send me a little help?
Thank you!
for each in XTrain
always loops through the column names only. That's how Pandas designs it to be.
Pandas allows comparison/ arithmetic operations with numbers directly. So you want:
# le is less than or equal to
XTrains.le(15).astype(int)
# same as
# (XTrain <= 15).astype(int)
If you really want to iterate (don't), remember that a dataframe is two dimensional. So something like this:
for index, row in df.iterrows():
for cell in row:
if cell <= 15:
# do something
# cell = 1 might not modify the cell in original dataframe
# this is a python thing and you will get used to it
else:
# do something else