Search code examples
pythonhtmlparsingtext-parsing

How to parse html code saved as text?


I have multiple .txt files containing HTML code (HTML code from web pages were copied and saved as .txt).

I want to parse these files as an HTML. Are there any libraries which have similar functionality as requests+bs4 bundle and can treat input from text files as a result of usual web parsing?

Thank you for your help.


Solution

  • As many of the comments stated it is possible to feed .txt file to BeautifulSoup():

    from bs4 import BeautifulSoup
    
    path = 'path/to/file.txt'
    with open(path) as f:
        text = f.read()
    BeautifulSoup(text, 'lxml')