I am very new to Python and I am trying to create a list out of string in python.
Input = "<html><body><ul style="padding-left: 5pt"><i>(See attached file: File1.pdf)</i><i>(See attached file: File2.ppt)</i><i>(See attached file: File3.docx)</i></ul></body></html>"
Desired Output = [File1.pdf, File2.ppt, File3.docx]
What is the most efficient and pythonic way to achieve this? Any help will be very much appreciated. Thanks
You can use beatifulsoup, which has HTML parsing utils.
>>> from bs4 import BeautifulSoup
>>> html = """<html><body><ul style="padding-left: 5pt"><i>(See attached file: File1.pdf)</i><i>(See attached file: File2.ppt)</i><i>(See attached file: File3.docx)</i></ul></body></html>"""
>>> soup = BeautifulSoup(html, parser='html')
>>> files_list = [i.text.split('file: ')[1].replace(')', '') for i in soup.find_all('i')]
>>> print(files_list)
['File1.pdf', 'File2.ppt', 'File3.docx']