I am trying to build a function that will allow me to iterate through a row of a pandas dataframe and change its values of "yes", "maybe", or "no" to 1, 0, and -1 respectively. I've done this before using the exact same process but for some reason, this time, it's giving me a key error. When it wasn't working, I tried to simplify it to see if the iterator was working properly and found that the iterator is somehow changing my data. Using the code below
def testing(data):
print(data)
for i in range(len(data)):
print(data[i])
testing(train_x['Values'])
The function returns the following and then hits 'Key Error: 7'
137 no
84 no
27 yes
127 maybe
132 no
...
9 no
103 yes
67 no
117 maybe
47 no
Name: Value, Length: 120, dtype: object
yes
no
no
no
no
no
no
Does anyone know why this is occurring? Does it have something to do with the values being shuffled due to train_tests_split? The last time I did this, I did it prior to the train_test_split and it worked perfectly fine but since then, I've realized data preprocessing is more effective if done after the split in order to stop data leakage. If the split is the problem, is there a way to solve this issue using a different iterator?
The train_test_split
does shuffle the values.
You might want to try this:
Replace:
def testing(data):
print(data)
for i in range(len(data)):
print(data[i])
With:
def testing(data):
for i in data.index:
print(data.iloc[i])