I'm trying to scrape stock codes from my country but I'm stuck on a "load more" button on the website in question.
Website: https://br.tradingview.com/markets/stocks-brazilia/market-movers-all-stocks/
My code:
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
req = Request('https://br.tradingview.com/markets/stocks-brazilia/market-movers-all-stocks/', headers = {'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req).read()
bs = BeautifulSoup(webpage, 'lxml')
table = bs.find('table')
table_rows = table.find_all('tr')
tickers = [x.div.a.text for x in table_rows[1:]]
print(tickers)
['AALR3', 'ABEV3', 'AERI3',...]
print(len(tickers))
150
I would like to scrape all the data that the "load more button" is making it impossible for me.
Is it possible to do this with beautifulSoup or would I have to resort to selenium?
When I try: inspect element > network > click load more
I can't find traces of requests to implement in my code, can someone shed some light?
You should rather make POST requests to the backend API, in your browser open the Developer Tools - Network tab - fetch/XHR then click "load more" and watch the "scan" query, you can replicate that in python and get all the data you want by editing the POST request like this:
import requests
import pandas as pd
import json
rows_to_scrape = 1000
payload = {"filter":[{"left":"name","operation":"nempty"},
{"left":"type","operation":"equal","right":"stock"},
{"left":"subtype","operation":"equal","right":"common"},
{"left":"typespecs","operation":"has_none_of","right":"odd"}],
"options":{"lang":"pt"},"markets":["brazil"],
"symbols":{"query":{"types":[]},"tickers":[]},"columns":
["logoid","name","close","change","change_abs","Recommend.All","volume","Value.Traded","market_cap_basic","price_earnings_ttm","earnings_per_share_basic_ttm","number_of_employees","sector","description","type","subtype","update_mode","pricescale","minmov","fractional","minmove2","currency","fundamental_currency_code"],
"sort":{"sortBy":"name","sortOrder":"asc"},
"range": [0,rows_to_scrape]} #change this to get more/less data
headers = {'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'}
url = 'https://scanner.tradingview.com/brazil/scan'
resp = requests.post(url,headers=headers,data=json.dumps(payload)).json()
output = [x['d'] for x in resp['data']]
print(len(output))
df= pd.DataFrame(output)
df.to_csv('tradingview_br.csv',index=False)
print('Saved to tradingview_br.csv')
It should be pretty easy to figure out what each data point is unfortunately there aren't any headings in that data