I'm learning how to do web scraping in Python using BeautifulSoup first. I've encountered a bit of an issue I'm not sure how to solve, I'll present you this snippet of my code:
from bs4 import BeautifulSoup
import requests
start_url = "https://www1.interactivebrokers.com/en/index.php?f=2222&exch=nasdaq&showcategories=STK#productbuffer"
# Download the HTML from start_url:
downloaded_html = requests.get(start_url)
# Parse the HTML with BeautifulSoup and create a soup object
soup = BeautifulSoup(downloaded_html.text)
# Select table where the data is:
rawTable = soup.select('table.table.table-striped.table-bordered tbody')[2]
url = rawTable.find_all('a',{'class':'linkexternal'})
print(url[0])
print(url[0].get('href'))
The outcome of the first print line is the first row after the header of the table containing company information (in the link you'll see it). The second outcome is just to get the href field, meant to be for a pop-up page containing further information, which I'll paste here:
javascript:NewWindow('https://contract.ibkr.info/index.php?action=Details&site=GEN&conid=48811132','Details','600','600','custom','front');
The actual URL, looks like this when I manually click it:
https://contract.ibkr.info/v3.10/index.php?action=Details&site=GEN&conid=48811132
Is there a command in BeautifulSoup that can help me get this? Or another Python module I can combine with BeautifulSoup in order to capture the URL of the pop-up? I don't want to use regular expressions to get this.
Thanks in advance for any help.
print(url[0].get('href').split("'")[1])
e.g.
href = "javascript:NewWindow('https://contract.ibkr.info/index.php?action=Details&site=GEN&conid=48811132','Details','600','600','custom','front');"
print(href.split("'")[1])
output
https://contract.ibkr.info/index.php?action=Details&site=GEN&conid=48811132