Search code examples
pythonhtmlbeautifulsouppython-2to3

BeautifulSoup HTMLparsingError between Python2 and Python3


I run a bs4 program on Python27, it works faultless, I am having a problem once I used Python3. I am using updated version of bs4 for both. The file I am running this on is html and I noticed the error is on a tag. Is there a supporting module I need to update? like lxml?

Code:

from bs4 import BeautifulSoup

data = open(directory +'\\'+ file)
soup = BeautifulSoup(data, 'html.parser')

Here is the error:

...
File "C:\Anaconda3\lib\html\parser.py", line 174, in error 
      raise HTMLParseError(message, self.getpos())
html.parser.HTMLParseError: unknown status keyword 'NKXE' in marked section, 
      at line 318, column 49

Always appreciate the help!


Solution

  • See if installing html5lib

    pip install html5lib
    

    And then making the request like this fixes the issue.

    from bs4 import BeautifulSoup
    
    data = open(directory +'\\'+ file)
    soup = BeautifulSoup(data, 'html5lib')
    

    This has worked for me.