I am attempting to parse strings using Regex. The strings look like:
Stack;O&verflow;i%s;the;best!
I want to parse it to:
Stack&verflow%sbest!
So when we see a ;
remove everything up until we see one of the following characters: [;,)%&@] (or replace with empty space "").
I am using re
package in Python:
string = re.sub('^[^-].*[)/]$', '', string)
This is what I have right now:
^[^;].*[;,)%&@]
Which as I understand it says: starting at the pattern with ;
, read everything that matches in between ;
and [;,)%&@] characters
But the result is wrong and looks like:
Stack;O&verflow;i%s;the;
What am I missing?
EDIT: @InSync pointed out that there is a discrepancy if ;
is in the end characters as well. As worded above, it should result inStack&verflow%s**;**best!
but instead I want to see Stack&verflow%sbest!
. Perhaps two regex lines are appropriate here, I am not sure; if you can get to Stack&verflow%s**;**best!
then the rest is just simple replacement of all the remaining ;
.
EDIT2: The code I found that works was
import re
def remove_semicolons(name):
name = re.sub(';.*?(?=[;,)%&@])', '', name)
name = re.sub(';','',name)
return name
remove_semicolons('Stack;O&verflow;i%s;the;best!')
Or if you feel like causing a headache to the next programmer who looks at your code:
import re
semicolon_string = 'Stack;O&verflow;i%s;the;best!'
cleaned_string = re.sub(';','',re.sub(';.*?(?=[;,)%&@])', '', semicolon_string))
Alright in my answer I assume you have a typo in your expected output. Remove everything starting with ; up to (;,)%&@) and so
Stack ;O &verflow ;i %s ;the ;best!
would become
Stack&verflow%s;best!
for the regex you want to start with ;
then anything after 0 or more times .*
(if you require a character change to .+
) followed by your ending characters [;,)%&@]
. To exclude them you need to add a positive lookahead ?(?=[;,)%&@])
. This as the name suggests looks ahead one character and tries to match it to your sequence
For a final regex:
;.*?(?=[;,)%&@])
or if you require characters in between:
;.+?(?=[;,)%&@])