Given a textile how can I replace all the tokens that have %
at the beginning for []
. For instance in the following text file:
Hi how are you?
I %am %fine.
Thanks %and %you
How can I enclose all the characters with %
with []
:
Hi how are you?
I [am] [fine].
Thanks [and] [you]
I tried to first filter the tokens and then replace them but maybe there is a more pythonic way:
with open('../file') as f:
s = str(f.readlines())
a_list = re.sub(r'(?<=\W)[$]\S*', s.replace('.',''))
a_list= set(a_list)
print(list(a_list))
You may use
re.sub(r'\B%(\w+)', r'[\1]', s)
See the regex demo
Details
\B
- a non-word boundary, there must be start of string or a non-word char immediately to the left of the current location%
- a %
char(\w+)
- Group 1: any 1 or more word chars (letters, digits or _
). Replace with (\S+)
to match 1 or more non-whitespace chars if necessary, but note \S
also matches punctuation.import re
s = "Hi how are you? \nI %am %fine.\nThanks %and %you"
result = re.sub(r"\B%(\w+)", r"[\1]", s)
print(result)