Search code examples
pythonregexpython-re

How to remove footnotes anchors ([1]) from a scraped wiki page?


I have a bunch of data that looks like this:

Bigtable,[4] MariaDB[5]

How do I use Python re library to remove those [4] quotations?


Solution

  • You can use the re.sub to remove those scientific quotations

    >>> import re
    >>> s = "Bigtable,[4] MariaDB[5]"
    >>> re.sub(r'\[.*?\]', '', s)
    'Bigtable, MariaDB'
    

    The regex \[.*?\] will match the substrings that starts with [ and ends with ] with as few character inside the brackets as possible

    If you only want to remove square brackets with numbers inside, use this regex instead: \[\d+\]