I am hoping someone can help me with this.
I have a df which I want to use replace on column 'Length' but I want to only use replace on certain rows which fit the criteria (which I have been able to do). The problem is the rows that do not fit the criteria which end up with n/a's but I really want to keep the existing data in those rows i.e. not run the replace method.
Here is the code to make the selection, criteria does not contain 'hour'.
mask = ~data['Length'].str.contains("hour")
This is my code for replacing which is working fine. I am writing over the original data.
data['Length'] = data.loc[mask, 'Length'].replace(r'([1])[h]\s', r'\1 hour ', regex=True)
data['Length'] = data.loc[mask, 'Length'].replace(r'([2-9]*)[h]\s', r'\1 hours ', regex=True)
data['Length'] = data.loc[mask, 'Length'].replace(r'([0-9]*)[m]', r'\1 minutes', regex=True)
Input data. Row 0 has the format I require in the original data. Other rows need to be run through the replace functions.
0 1 hour 30 minutes
1 2h 45m
2 4h 30m
3 1h 45m
4 1h 45m
5 1h 45m
6 1h 45m
7 1h 45m
Output currently, I want to keep the original data in row 0.
0 NaN
1 2 hours 45 minutes
2 4 hours 30 minutes
3 1 hour 45 minutes
4 1 hour 45 minutes
5 1 hour 45 minutes
6 1 hour 45 minutes
7 1 hour 45 minutes
Now I am not married to the methods I have used so if there is a better way then I am open to suggestions, but hopefully there is something simple I am missing. It seems like it should be so simple as it's just an if-else. thanks
You must pass in the mask
on the left side of your assignment as well:
# Add ".loc[mask, " on the left side:
data.loc[mask, 'Length'] = data.loc[mask, 'Length'].replace(r'([1])[h]\s', r'\1 hour ', regex=True)
data.loc[mask, 'Length'] = data.loc[mask, 'Length'].replace(r'([2-9]*)[h]\s', r'\1 hours ', regex=True)
data.loc[mask, 'Length'] = data.loc[mask, 'Length'].replace(r'([0-9]*)[m]', r'\1 minutes', regex=True)
Otherwise you are mutating the whole data['Length']
column by assigning a filtered (lower-length) series
to it, which makes the empty slots being auto-filled with np.nan
.