I have a file that contains segments that form a word in the following format <+segment1 segment2 segment3 segment4+>
, what I want to have is an output with all the segments beside each other to form one word (So basically I want to remove the space between the segments and the <+ +>
sign surronding the segments). So for example:
Input:
<+play ing+> <+game s .+>
Output:
playing games.
I tried first detecting the pattern using \<\+(.*?)\+\>
but I cannot seem to know how to remove the spaces
Use this Python code:
import re
line = '<+play ing+> <+game s .+>'
line = re.sub(r'<\+\s*(.*?)\s*\+>', lambda z: z.group(1).replace(" ", ""), line)
print(line)
Results: playing games.
The lambda removes spaces additionally.
REGEX EXPLANATION
--------------------------------------------------------------------------------
< '<'
--------------------------------------------------------------------------------
\+ '+'
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
\+ '+'
--------------------------------------------------------------------------------
> '>'