Get an <a> tag content using BeautifulSoup

I'd like to get the content of an <a> tag using BeautifulSoup (version 4.12.3) in Python. I have this code and HTML exemple:

h = """
<a id="0">
    <table> 
  <thead>
    <tr>
      <th scope="col">Person</th>
      <th scope="col">Most interest in</th>
      <th scope="col">Age</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th scope="row">Chris</th>
      <td>HTML tables</td>
      <td>22</td>
    </tr>
    </table>
</a>
"""

test = bs4.BeautifulSoup(h)
test.find('a')  # find_all, select => same results

But it only returns :

<a id="0">
</a>

I'd would expect that the content inside <table> would appear between <a> tags. (I don't know if it is common to wrap a table inside an <a> tag but the HTML code I try to read is like so)

I need to parse the table content from the <a> tag since I need to link the id="0" to the content of the table.

How can I achieve that ? How can I get the <a> tag content with the <table> tag ?

Solution

Specify explicitly the parser you want to use (use html.parser). By default it will use the "best" parser available - I pressume lxml which doesn't parse this document well:

import bs4

h = """
<a id="0">
    <table> 
  <thead>
    <tr>
      <th scope="col">Person</th>
      <th scope="col">Most interest in</th>
      <th scope="col">Age</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th scope="row">Chris</th>
      <td>HTML tables</td>
      <td>22</td>
    </tr>
    </table>
</a>
"""

test = bs4.BeautifulSoup(h, "html.parser")  # <-- define parser here
out = test.find("a")

print(out)

Prints:

<a id="0">
<table>
<thead>
<tr>
<th scope="col">Person</th>
<th scope="col">Most interest in</th>
<th scope="col">Age</th>
</tr>
</thead>
<tbody>
<tr>
<th scope="row">Chris</th>
<td>HTML tables</td>
<td>22</td>
</tr>
</tbody></table>
</a>