Search code examples
pythonregexstringpython-re

How can I remove the closing square bracket using regex in Python?


I have a messy list of strings (list_strings), where I am able to remove using regex the unwanted characters, but I am struggling to also remove the closing bracket ] . How can I also remove those ? I guess I am very close...

#the list to clean
list_strings = ['[ABC1: text1]', '[[DC: this is a text]]', '[ABC-O: potatoes]', '[[C-DF: hello]]']

#remove from [ up to : 
for string in list_strings:
  cleaned = re.sub(r'[\[A-Z\d\-]+:\s*', '', string)
  print(cleaned)

# current output

>>>text1]
>>>this is a text]]
>>>potatoes]
>>>hello]

Desired output:

text1
this is a text
potatoes
hello

Solution

  • You can use

    cleaned = re.sub(r'^\[+[A-Z\d-]+:\s*|]+$', '', string)
    

    See the Python demo and the regex demo.

    Alternatively, to make sure the string starts with [[word: and ends with ]s, you may use

    cleaned = re.sub(r'^\[+[A-Z\d-]+:\s*(.*?)\s*]+$', r'\1', string)
    

    See this regex demo and this Python demo.

    And, in case you simply want to extract that text inside, you may use

    # First match only
    m = re.search(r'\[+[A-Z\d-]+:\s*(.*?)\s*]', string)
    if m:
        print(m.group(1))
    
    # All matches
    matches = re.findall(r'\[+[A-Z\d-]+:\s*(.*?)\s*]', string)
    

    See this regex demo and this Python demo.

    Details

    • ^ - start of string
    • \[+ - one or more [ chars
    • [A-Z\d-]+ - one or more uppercase ASCII letters, digits or - chars
    • : - a colon
    • \s* - zero or more whitespaces
    • | - or
    • ]+$ - one or more ] chars at the end of string.

    Also, (.*?) is a capturing group with ID 1 that matches any zero or more chars other than line break chars, as few as possible. \1 in the replacement refers to the value stored in this group memory buffer.