I have tried several different methods to add a row to an existing Pandas Dataframe. For example I tried the solution here. However I was not able to correct the issue. I have reverted back to my original code in hopes someone can help me here.
Here is my code:
print('XDF Created, Starting Bucket Separation...')
XDFDFdrop = pd.DataFrame.duplicated(XDFDF,subset='LastSurveyMachineID')
index_of_unique = XDFDF.drop_duplicates(subset='LastSurveyMachineID')
for index,row in zip(XDFDFdrop,XDFDF.itertuples()):
if index:
goodBucket.append(row)
else:
badBucket.append(row)
goodBucketDF = pd.DataFrame(goodBucket)
badBucketDF = pd.DataFrame(badBucket)
print('Bucket Separation Complete, EmailPrefix to F+L Test Starting...')
for emp , fname , lname , row1 in zip(goodBucketDF['EmailPrefix'] , goodBucketDF['Fname'] , goodBucketDF['Lname'] , goodBucketDF.itertuples()):
for emp2 , row2 in zip(goodBucketDF['EmailPrefix'] , goodBucketDF.itertuples()):
if columns != rows:
temp = fuzz.token_sort_ratio((fname+lname),emp)
temp2 = fuzz.token_sort_ratio((fname+lname),emp2)
if abs(temp - temp2) < 10:
badBucketDF.append(list(row2))
goodBucketDF = goodBucketDF.drop(row2)
removed = True
rows += 1
if removed:
badBucketDF.append(list(row2))
goodBucketDF = goodBucketDF.drop(row2)
removed = False
columns += 1
Please note: XDFDF is a relatively large data set that is built using pandas and was pulled from a database (it should not affect the code you see just figured I would disclose that information).
This is my Error:
Traceback (most recent call last):
File "/Users/john/PycharmProjects/Greatness/venv/Recipes.py", line 122, in <module>
goodBucketDF = goodBucketDF.drop([rows])
File "/Users/john/PycharmProjects/Greatness/venv/lib/python3.6/site-packages/pandas/core/frame.py", line 3694, in drop
errors=errors)
File "/Users/john/PycharmProjects/Greatness/venv/lib/python3.6/site-packages/pandas/core/generic.py", line 3108, in drop
obj = obj._drop_axis(labels, axis, level=level, errors=errors)
File "/Users/john/PycharmProjects/Greatness/venv/lib/python3.6/site-packages/pandas/core/generic.py", line 3140, in _drop_axis
new_axis = axis.drop(labels, errors=errors)
File "/Users/john/PycharmProjects/Greatness/venv/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 4387, in drop
'labels %s not contained in axis' % labels[mask])
KeyError: 'labels [(15, '1397659289', 'joshi.penguin@gmail.com', 'jim', 'smith', '1994-05-04', 'joshi.penguin', 'CF032611-8A86-4688-9715-E1278E75D046')] not contained in axis'
Process finished with exit code 1
I would like to know if anyone has a solution to this error so that: I can add a take a row from one Dataframe, place it in the the other DataFrame (does not need to be in order, and I don't care if index duplicates or not). Once it is in its new Dataframe I want to remove it from the old one.
My current issue is removing the row from the old Dataframe. Any help would be appreciated.
If you have any questions on the code please let me know and I will respond as soon as I can. Thank you for your help.
Edit 1
Below I have included a printout of row1. Hopefully this will help as well.
Pandas(Index=1, _1=2, entity_id='1180722688', email='assassin_penguin@live.com', Fname='jim', Lname='smith', Birthdate='1990-09-14', EmailPrefix='assassin_penguin', LastSurveyMachineID=None)
Given that XDFDF is a pandas.DataFrame
, shouldn't the following work?
XDFDFdrop = pd.DataFrame.duplicated(XDFDF,subset='LastSurveyMachineID')
goodBucket = XDFDF.loc[~XDFDFdrop] #the ~ negates a boolean array
badBucket = XDFDF.loc[XDFDFdrop]
Edit:
The updated error comes from you passing an entire row rather than an index to the function pandas.DataFrame.drop
.