Search code examples
pythonpython-regex

More elegant way of replacing substring of text that was matched with regex


I am new to python so I would like to get a few ideas for this. I am writing a function to find matching word patterns in a sentence and replace the spaces inside of only the matched words.

Input:

(c)variable < var_CONST1(例 -125(N)) 【AAA BBB有】AND【技術企画】AND【AAA BBB CCC】

Expected Output:

(c)variable < var_CONST1(例 -125(N)) 【AAA-BBB有】AND【技術企画】AND 【AAA-BBB-CCC】

In the sample, spaces inside "【AAA BBB有】" and "【AAA BBB CCC】" should be replaced with "-".

I created the code below which solves the problem. However, I would like to know if is a better/more elegant way of writing it.

import re

text = "(c)variable < var_CONST1(例 -125(N)) 【AAA BBB有】AND【技術企画】AND 【AAA BBB CCC】"

match_list = re.findall(r"【[\w\s]+】", text)
match_list = [w.replace(" ", "-") for w in match_list]
tmp_txt = re.sub(r"【[\w\s]+】",  " tkn ", text).split()

new_txt = ""
for txt in tmp_txt:
    if txt == "tkn":
      new_txt = new_txt + " " + match_list[0]
      match_list.pop(0)
    else:
      new_txt = new_txt + " " + txt

print(new_txt)

Thank you very much.


Solution

  • We can use re.sub here with a callback function to target only spaces occurring inside 【...】:

    inp = "(c)variable < var_CONST1(例 -125(N)) 【AAA BBB有】AND【技術企画】AND【AAA BBB CCC】"
    output = re.sub(r'【.*?】', lambda m: m.group().replace(' ', '-'), inp)
    print(output)
    

    This prints:

    (c)variable < var_CONST1(例 -125(N)) 【AAA-BBB有】AND【技術企画】AND【AAA-BBB-CCC】