How do I remove the leading and trailing non-alpha characters in a given string, before and after a certain substring? See the example below
input_string = m#12$my#tr!#$g%
output_string = m12my#tr!g
The substring, in this case, is my#tr!
How can get the output_string given the input_string?
My attempt below removes all the leading characters (including alphanumeric). See the code snippet below). I tried amending \W+
instead of .+
which did not work.
import re
input_string = "m#12$my#tr#$%"
output_string = re.sub(r'.+?(?=my#tr!)', '', "m#12$my#tr!#$g%")
Appreciate any thought on how I could use the regex pattern for this purpose.
One way to do this is to split the string around the desired substring, replace the non-alphanumeric characters in the first and last parts and then reassemble the string:
import re
input_string = "m#12$my#tr!#$g%"
mid = 'my#tr!'
first, last = input_string.split(mid)
first = re.sub('[^a-z0-9]', '', first)
last = re.sub('[^a-z0-9]', '', last)
output_string = first + mid + last
print(output_string)
Output:
m12my#tr!g
If you use the regex
module from PyPi, you can take advantage of variable length lookbehinds and replace any non-alphanumeric character that is before or after the target string:
import regex
input_string = "m#12$my#tr!#$g%"
mid = 'my#tr!'
output_string = regex.sub(rf'[^a-z0-9](?=.*{mid})|(?<={mid}.*)[^a-z0-9]', '', input_string)
# 'm12my#tr!g'
Note that if mid
contains characters that are special to regex (e.g. . [ { $ ^
etc) you should escape it before use i.e.
mid = 'my#tr!'
mid = regex.escape(mid)
If you don't want to use regex at all, you could manually strip the non-alphanumeric characters out of the first and last parts. For example:
import string
input_string = "m#12$my#tr!#$g%"
mid = 'my#tr!'
first, last = input_string.split(mid)
first = ''.join(c for c in first if c in string.ascii_letters + string.digits)
last = ''.join(c for c in last if c in string.ascii_letters + string.digits)
output_string = first + mid + last