Search code examples
beautifulsoup

decode web page using JS


How to use the BeautifulSoup to decode page which is JS ?

buf = requests.get() soup = BeautifulSoup(buf,"html.parser")

  1. when decoding "theglobeandmail.com/investing/markets/stocks/XDV-T", it is working, all the data available in "soup"

  2. when decoding "money.tmx.com/quote/BNS", only some info is available in "soup". When print the "buf" line by line, I noticed it is embedded by JS files.


Solution

  • Your issue derives from the fact that BeautifulSoup can only parse the HTML that you get from the initial request. In the second example, tmx.com is requesting a separate file (in this case https://app-money.tmx.com/graphql) that contains the price information, which is why it doesn't appear in your BeautifulSoup request. You can see this by opening the Inspect Developer tools tab by pressing F12 and navigating to the Network tab:

    Network requests.

    In order to get the price information, you'll need to send a request to https://app-money.tmx.com/graphql instead of https://money.tmx.com/quote/BNS with the appropriate headers indicating which stock you're requesting.