I am trying to scrape the 'ASX code' for announcements made by companies on the Australian Stock Exchange from the following website: http://www.asx.com.au/asx/statistics/todayAnns.do
So far I have tried using BeautifulSoup with the following code:
import requests
from bs4 import BeautifulSoup
response = requests.get('http://www.asx.com.au/asx/statistics/todayAnns.do')
parser = BeautifulSoup(response.content, 'html.parser')
print(parser)
However when I print this, it does not print the same as when I manually go onto the page and view the page source. I have done some googling and looked on stackoverflow and believe that this is due to Javascript running on the page which hides the html code.
However I am unsure how to go about getting around this. Any help would be greatly appreciated.
Thanks in advance.
Try this. All you need to do is let the scraper wait for some moments until the page is loaded cause you perhaps already noticed that the content is being loaded dynamically. However, upon execution you will get the left sided header of the table from that webpage.
import time
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome()
driver.get('http://www.asx.com.au/asx/statistics/todayAnns.do')
time.sleep(8)
soup = BeautifulSoup(driver.page_source,"lxml")
for item in soup.select('.row'):
print(item.text)
driver.quit()
Partial results:
RLC
RNE
PFM
PDF
HXG
NCZ
NCZ
Btw, I've written and executed this code using python 3.5. So, no issues are there with latest version of python when it comes to bind selenium.