I have a txt file with several strings and some of them are enclosed in double (or triple) quotes and would like to remove what is inside the quotation marks and remain only the quotation marks. Example:
""" aaaa """
bbbbb
ccccc
"""
dddddd
"""
and should look like this:
""" """
bbbbb
ccccc
"""
"""
I have to do this in python. Does anyone have any idea of a module that does this?
You can try to use the following regex:
s = '''
""" aaaa """
bbbbb
ccccc
"""
dddddd
"""
'''
import re
print(re.sub(r'(\"{2,3}[\s\n]*).*?([\n\s]*\"{2,3})', r'\1\2', s, flags=re.MULTILINE))
this outputs:
""" """
bbbbb
ccccc
"""
"""
EDIT: to match multiline inside the quotes regex should be updated. Here is the example:
s = '''
""" aaaa """
bbbbb
ccccc
"""
dddddd
bb
"""
'''
import re
print(re.sub(r'(\"{2,3}[\s\n]*)(?:.*?[\s\n]*)*([\n\s]*\"{2,3})', r'\1\2', s, flags=re.MULTILINE))
gives output:
""" """
bbbbb
ccccc
"""
"""