I have a wrapper class to work with a specific dataframe and some modifier functions/callables to operate with it.
class PhoneNumberCleaner:
def __init__(self, data: pd.DataFrame, pattern: str):
self.data = data # shallow copy?
self.pattern = pattern
def __call__(self, *args, **kwargs) -> pd.DataFrame:
drop_mask = self.data['phoneNumber'].apply(
lambda pn: not re.fullmatch(self.pattern, pn)
)
drop_mask_index = drop_mask[drop_mask].index
return self.data.drop(drop_mask_index)
class Wrapper:
def __init__(self, data: pd.DataFrame):
self.data = data
def modify(self, modifier: Callable, *args, **kwargs):
self.data = modifier(*args, **kwargs)
Now, let's say I have following data:
df_data = {
'name': ['Mickey', 'Anna', 'Todd', 'Lee', 'Amanda', 'Jake'],
'phoneNumber': [
'0321111444---',
'0335555666',
'0330001234',
'0330123456789',
'0328888999',
'0999999999999',
]
}
df = pd.DataFrame(df_data)
and I want to drop rows where person has incorrect phone number pattern:
wrapper = Wrapper(df)
number_cleaner = PhoneNumberCleaner(wrapper.data, r'\d{10}')
wrapper.modify(number_cleaner)
Printing wrapper data works fine:
print(wrapper.data)
name phoneNumber
1 Anna 0335555666
2 Todd 0330001234
4 Amanda 0328888999
However, when I want to access same data through PhoneNumberCleaner
object (that is supposed to refer to the same dataframe), I get the old data:
print(number_cleaner.data)
name phoneNumber
0 Mickey 0321111444---
1 Anna 0335555666
2 Todd 0330001234
3 Lee 0330123456789
4 Amanda 0328888999
5 Jake 0999999999999
I tried to add .copy(deep=False)
when assigning data in Wrapper
and PhoneNumberCleaner
classes, but it doesn't help. What am I missing here?
This line:
class PhoneNumberCleaner:
def __call__(self, *args, **kwargs) -> pd.DataFrame:
...
return self.data.drop(drop_mask_index)
DataFrame.drop
returns a new dataframe. The original dataframe (self.data
) was not modified.
Change it to:
class PhoneNumberCleaner:
def __call__(self, *args, **kwargs) -> pd.DataFrame:
...
self.data.drop(drop_mask_index, inplace=True)
return self.data