Filter just the id number from a URL in BeautifulSoup

I've gotten to a point where

print(soup.td.a)

results in

<a href="/?p=section&amp;a=details&amp;id=37627">Some Text Here</a>

I'm trying to figure out how I can filter further so all that results is

I've tried a number of things including urlparse and re.compile but I'm just not getting the syntax correct. Plus I feel like there is probably an easier way that I'm just not finding. I appreciate any help given.

Solution

You can use the parse_qs() method to parse queries:


from bs4 import BeautifulSoup
from urllib.parse import urlparse, parse_qs

html_content = '''
<td>
    <a href="/?p=section&amp;a=details&amp;id=37627">Some Text Here</a>
</td>
'''

# Parse the HTML content
soup = BeautifulSoup(html_content, 'html.parser')

# Find the <a> tag
a_tag = soup.find('a')

# Extract the href attribute
href = a_tag.get('href')

# Parse the URL to get the query parameters
parsed_url = urlparse(href)
# for py2: parsed_url = urlparse.urlparse(url)
query_params = parse_qs(parsed_url.query)

# Get the 'id' parameter
id_value = query_params.get('id', [None])[0]

print(id_value)  # Output: 37627