Search code examples
pythonhtmlxmlxml-parsinghtml-parsing

How to retrieve html from an xml?


I'm trying to get get HTML-Code from inside an XML File and all i get are the single elements.

XML-Example:

  <?xml version="1.0" encoding="ISO-8859-1"?>
  <websites>
    <website name="1">
      <html xmlns="http://www.w3.org/1999/xhtml">
        <head>
          <title/>
        </head><body>Sample Content.....</body>
      </html>
    </website>
  </websites>

I need a string containing only the html like this

<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title/>
   </head><body>Sample Content.....</body>
</html>

Solution

  • You can use beautifulsoup:

    from bs4 import BeautifulSoup
    
    example = """
    <?xml version="1.0" encoding="ISO-8859-1"?>
    <websites>
      <website name="1">
        <html xmlns="http://www.w3.org/1999/xhtml">
          <head>
            <title/>
          </head><body>Sample Content.....</body>
        </html>
      </website>
    </websites>
    """
    
    soup = BeautifulSoup(example)
    html = soup.find('html')
    print(html)
    

    Output:

    <html xmlns="http://www.w3.org/1999/xhtml">
    <head>
    <title></title>
    </head><body>Sample Content.....</body>
    </html>