For a column in a pandas dataframe, I want to remove any number either immediately followed by "gb" or "mb" or with a space in-between, in it's entirety. I.e. remove strings such as "500 gb" and "500mb".
Column_To_Fix
0 coolblue 100gb
1 connector 500 mb for thing
2 5gb for user
3 load 800 mb
4 1000 add-on
5 20 gb
The function below only works for row 0 and row 2, not sure how to add in the space requirement for the pattern:
pat = '(^|\s)\d+(gb|mb)($|\s)'
df['Column_To_Fix'].str.lower().replace(pat, ' ', regex=True)
Desired Output:
Column_To_Fix
0 coolblue
1 connector for thing
2 for user
3 load
4 1000 add-on
5
Try this pattern
pat = '\d+ *(gb|mb)'
df['Column_To_Fix'].str.lower().str.replace(pat, ' ')
Out[462]:
0 coolblue
1 connector for thing
2 for user
3 load
4 1000 add-on
5
Name: Column_To_Fix, dtype: object
If you prefer series.replace
df['Column_To_Fix'].str.lower().replace(pat, ' ', regex=True)