Search code examples
javascriptpythonselenium-webdriveriframeweb-scraping

Python webscraping: BeautifulSoup not showing all html source content


I am quite new to webscraping and python. I was trying make a script that gets the Last Trade Price from http://finra-markets.morningstar.com/BondCenter/BondDetail.jsp?symbol=NFLX4333665&ticker=C647273 but some content seems to be missing when i request it with python. I have made scripts that got data from other websites successfully before, but i cant seem to get my code to work on this website.
This is my code so far:

from bs4 import BeautifulSoup
import requests

r = requests.get("http://finra-markets.morningstar.com/BondCenter/BondDetail.jsp?symbol=NFLX4333665&ticker=C647273")
c = r.content
soup = BeautifulSoup(c, "html.parser")

all = soup.find_all("div", {"class": "gr_row_a5"})
print(soup)


when i run this most of the important data is missing.

Any help would be much appreciated.


Solution

  • Be careful with iframe

    If have observed div class="gr_row_a5" is placed inside iframe. To Crawl data inside iframe you need to go inside that iframe and then need to take page source.

    from selenium import webdriver
    import selenium
    import httplib
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.common.exceptions import TimeoutException
    from selenium.common.exceptions import NoSuchElementException
    from selenium.common.exceptions import StaleElementReferenceException
    from selenium.common.exceptions import WebDriverException
    from datetime import datetime as dt
    from bs4 import BeautifulSoup
    
    
    browser = webdriver.Chrome()
    browser.delete_all_cookies()
    browser.get('http://finra-markets.morningstar.com/BondCenter/BondDetail.jsp?symbol=NFLX4333665&ticker=C647273')
    
    iframe = browser.find_element(By.ID, 'ms-bond-detail-iframe')
    browser.switch_to.frame(iframe)
    
    c = browser.page_source
    soup = BeautifulSoup(c, "html.parser")
    
    all = soup.find_all("div", {"class": "gr_row_a5"})
    print(all)
    

    Hope this solves your problem, if not kindly let me know. Thanks