parsing XML within HTML using python

I have an HTML file which contains XML at the bottom of it and enclosed by comments, it looks like this:

<!DOCTYPE html>
<html>
<head>
    ***
</head>
<body>
    <div class="panel panel-primary call__report-modal-panel">
        <div class="panel-heading text-center custom-panel-heading">
            <h2>Report</h2>
        </div>
        <div class="panel-body">
            <div class="panel panel-default">
                <div class="panel-heading">
                    <div class="panel-title">Info</div>
                </div>
                <div class="panel-body">
                    <table class="table table-bordered table-page-break-auto table-layout-fixed">
                        <tr>
                            <td class="col-sm-4">ID</td>
                            <td class="col-sm-8">1</td>
                        </tr>

            </table>
        </div>
    </div>
</body>
</html>
<!--<?xml version = "1.0" encoding="Windows-1252" standalone="yes"?>
<ROOTTAG>
  <mytag>
    <headername>BASE</headername>
    <fieldname>NAME</fieldname>
    <val><![CDATA[Testcase]]></val>
  </mytag>
  <mytag>
    <headername>BASE</headername>
    <fieldname>AGE</fieldname>
    <val><![CDATA[5]]></val>
  </mytag>

</ROOTTAG>
-->

Requirement is to parse the XML which is in comments in above HTML. So far I have tried to read the HTML file and pass it to a string and did following:

with open('my_html.html', 'rb') as file:
    d = str(file.read())
    d2 = d[d.index('<!--') + 4:d.index('-->')]
    d3 = "'''"+d2+"'''"

this is returning XML piece of data in string d3 with 3 single qoutes.

Then trying to read it via Etree:

ET.fromstring(d3)

but it is failing with following error:

xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1, column 2

need some help to basically:

Read HTML
take out snippet with XML piece which is commented at bottom of HTML
take that string and pass to ET.fromString() function, but since this function takes string with triple qoutes, it is not formatting it properly and hence throwing the error

Solution

You already have been on the right path. I put your HTML in the file and it works fine like following.

import xml.etree.ElementTree as ET

with open('extract_xml.html') as handle:
    content = handle.read()
    xml = content[content.index('<!--')+4: content.index('-->')]
    document = ET.fromstring(xml)

    for element in document.findall("./mytag"):
        for child in element:
            print(child, child.text)