the HTML
:
<div id="divTradeHaltResults">
<div class="genTable"
<table>
<tbody>
<tr>
<td> 03/10/2020 </td>
<td> 15:11:45 </td>
the Code:
url = r'https://www.nasdaqtrader.com/trader.aspx?id=TradeHalts'
r=requests.get(url)
soup = BeautifulSoup(r.content, "html.parser")
table = soup.find('div',{'id':'divTradeHaltResults'})
divclass=table.find('div',{'class':"genTable"})
divt=divclass.find('table')
result:
divclass={None Type}None
I have tried the 'lxml' parser to no avail.
I can get it using Selenium but it uses too many resources.
From checking all other problems around multiple div's
there seems to be and inherent problem with bs4.
Has anyone solved it ? I have tried multiple ideas from other
people.
The reason why you are getting None
, that's due the page is loaded dynamically via JavaScript
which is rendered once the page itself loads.
Therefor I've been able to track the origin of the table
which from where the JS
sending an XHR
request to obtain it. that's can be tracked via your Browser Developer-Tools
under Network-Tab
.
Otherwise you can use selenium
for that. I've included both solution for you.
import requests
import pandas as pd
json = {
"id": 2,
"method": "BL_TradeHalt.GetTradeHalts",
"params": "[]",
"version": "1.1"
}
headers = {
'Referer': 'https://www.nasdaqtrader.com/trader.aspx?id=TradeHalts'
}
r = requests.post(
"https://www.nasdaqtrader.com/RPCHandler.axd", json=json, headers=headers).json()
df = pd.read_html(r["result"])[0]
df.to_csv("table1.csv", index=False)
Output: view-online
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from bs4 import BeautifulSoup
import pandas as pd
options = Options()
options.add_argument('--headless')
driver = webdriver.Firefox(options=options)
driver.get(
"https://www.nasdaqtrader.com/trader.aspx?id=TradeHalts")
df = pd.read_html(driver.page_source)[2]
# print(df)
df.to_csv("table.csv", index=False)
driver.quit()
Output: view-online