Search code examples
pythonmethodscomments

How to remove multiple quotes and lines within double quotes from a .txt file in python code?


I have a txt file with several strings and some of them are enclosed in double (or triple) quotes and would like to remove what is inside the quotation marks and remain only the quotation marks. Example:

""" aaaa """

bbbbb
ccccc

"""
dddddd
"""

and should look like this:

""" """

bbbbb
ccccc

"""

"""

I have to do this in python. Does anyone have any idea of a module that does this?


Solution

  • You can try to use the following regex:

    s = '''
    """ aaaa """
    
    bbbbb
    ccccc
    
    """
    dddddd
    """
    '''
    
    import re
    print(re.sub(r'(\"{2,3}[\s\n]*).*?([\n\s]*\"{2,3})', r'\1\2', s, flags=re.MULTILINE))
    

    this outputs:

    """  """
    
    bbbbb
    ccccc
    
    """
    
    """
    

    EDIT: to match multiline inside the quotes regex should be updated. Here is the example:

    s = '''
    """ aaaa """
    
    bbbbb
    ccccc
    
    """
    dddddd
    bb
    """
    '''
    
    import re
    
    print(re.sub(r'(\"{2,3}[\s\n]*)(?:.*?[\s\n]*)*([\n\s]*\"{2,3})', r'\1\2', s, flags=re.MULTILINE))
    

    gives output:

    """ """
    
    bbbbb
    ccccc
    
    """
    """