Search code examples
pythonweb-scrapingbeautifulsouppython-requestsurlopen

Scrape pages with "load more" button


I'm trying to scrape stock codes from my country but I'm stuck on a "load more" button on the website in question.

Website: https://br.tradingview.com/markets/stocks-brazilia/market-movers-all-stocks/

My code:

from bs4 import BeautifulSoup
from urllib.request import Request, urlopen

req = Request('https://br.tradingview.com/markets/stocks-brazilia/market-movers-all-stocks/', headers = {'User-Agent': 'Mozilla/5.0'})

webpage = urlopen(req).read()
bs = BeautifulSoup(webpage, 'lxml')

table = bs.find('table')

table_rows = table.find_all('tr')

tickers = [x.div.a.text for x in table_rows[1:]]

print(tickers)
['AALR3', 'ABEV3', 'AERI3',...]

print(len(tickers))
150

I would like to scrape all the data that the "load more button" is making it impossible for me.

Is it possible to do this with beautifulSoup or would I have to resort to selenium?

When I try: inspect element > network > click load more

I can't find traces of requests to implement in my code, can someone shed some light?

enter image description here


Solution

  • You should rather make POST requests to the backend API, in your browser open the Developer Tools - Network tab - fetch/XHR then click "load more" and watch the "scan" query, you can replicate that in python and get all the data you want by editing the POST request like this:

    import requests
    import pandas as pd
    import json
    
    rows_to_scrape = 1000
    
    payload = {"filter":[{"left":"name","operation":"nempty"},
        {"left":"type","operation":"equal","right":"stock"},
        {"left":"subtype","operation":"equal","right":"common"},
        {"left":"typespecs","operation":"has_none_of","right":"odd"}],
        "options":{"lang":"pt"},"markets":["brazil"],
        "symbols":{"query":{"types":[]},"tickers":[]},"columns":
        ["logoid","name","close","change","change_abs","Recommend.All","volume","Value.Traded","market_cap_basic","price_earnings_ttm","earnings_per_share_basic_ttm","number_of_employees","sector","description","type","subtype","update_mode","pricescale","minmov","fractional","minmove2","currency","fundamental_currency_code"],
        "sort":{"sortBy":"name","sortOrder":"asc"},
        "range": [0,rows_to_scrape]} #change this to get more/less data
    
    headers =   {'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'}
    url = 'https://scanner.tradingview.com/brazil/scan'
    
    resp = requests.post(url,headers=headers,data=json.dumps(payload)).json()
    output = [x['d'] for x in resp['data']]
    print(len(output))
    
    df= pd.DataFrame(output)
    df.to_csv('tradingview_br.csv',index=False)
    print('Saved to tradingview_br.csv')
    

    It should be pretty easy to figure out what each data point is unfortunately there aren't any headings in that data