Imagine I have a few values like
test_val1 = 'E 18TH ST AND A AVE'
test_val2 = 'E 31ST ST AND A AVE'
I want to find the 18th, 31st, etc., and replace it with 18/31 - basically removing the suffix but keep the entire string as such.
Expected value
test_val1 = 'E 18 ST AND A AVE'
test_val2 = 'E 31 ST AND A AVE'
Please note that I do not want to remove the "St" which corresponds to 'street', so a blind replacement is not possible.
My approach was to use below (for 'th' at the moment), but it doesn't work since the function cannot keep the value/text in memory to return it.
import regex as re
test_val1.replace('\d{1,}TH', '\d{1,}', regex=True)
I have a column full of these values, so a solution that I can run/apply on a Pnadas column would be really helpful.
For the following sample dataframe
df = pd.DataFrame({"Test": ['E 18TH ST AND A AVE', 'E 31ST ST AND A AVE']})
Test
0 E 18TH ST AND A AVE
1 E 31ST ST AND A AVE
this
df.Test = df.Test.str.replace(r'(\d+)(TH|ST)', lambda m: m.group(1), regex=True)
produces
Test
0 E 18 ST AND A AVE
1 E 31 ST AND A AVE
Is that what you are looking for? Check out the docs for more details.
The lambda
function is used as a repl
function ("replace") whose returns replace the pattern matches in the strings. Per definition it gets as argument the respective match object and has to return a string, usually derived from the match object, but it could be totally unrelated. The function here returns the content of the 1. capture group via the match object method group
: The (\d+)
-part.