I am trying to scrape intraday prices for a company, using this website:Enel Intraday
When the website pulls the data, it splits them into few hundreds pages, which makes it very time consuming to pull the data from. Using insomnia.rest (for the first time), i have been trying to play with the URL GET or try and find the actual javascrip function that returns these table values but without success.
Having inspected the search button, i find that the JS function is called "searchIntraday" and use a form as input called "intraday_form".
I am basically trying to get the following data in 1 call rather having to go through all tab pages, so a full day would look like this:
Time Last Trade Price Var % Last Volume Type
5:40:49 PM 7.855 -2.88 570 AT
5:38:17 PM 7.855 -2.88 300 AT
5:37:10 PM 7.855 -2.88 290 AT
5:36:06 PM 7.855 -2.88 850 AT
5:35:56 PM 7.855 -2.88 14,508,309 UT
5:29:59 PM 7.872 -2.67 260 AT
5:29:59 PM 7.871 -2.68 4,300 AT
5:29:59 PM 7.872 -2.67 439 AT
5:29:59 PM 7.872 -2.67 3,575 AT
5:29:59 PM 7.87 -2.7 1,000 AT
5:29:59 PM 7.87 -2.7 1,000 AT
5:29:59 PM 7.87 -2.7 1,000 AT
5:29:59 PM 7.87 -2.7 4,000 AT
5:29:59 PM 7.87 -2.7 300 AT
5:29:59 PM 7.87 -2.7 2,000 AT
5:29:59 PM 7.87 -2.7 200 AT
5:29:59 PM 7.87 -2.7 400 AT
5:29:59 PM 7.87 -2.7 500 AT
5:29:59 PM 7.872 -2.67 1,812 AT
5:29:59 PM 7.872 -2.67 5,000 AT
..................................................
Time Last Trade Price Var % Last Volume Type
9:00:07 AM 8.1 0.15 933,945 UT
which for that day is iterating from page 1 to page 1017!
I looked at the below page for help:
The data doesn't appear to be generated by javascript, but rather by loading pages. The image below is the response I get when I load the link below. You can see that the location of the request matches the location on the page and that the HTML for the table is sent along with the page response.
The HTML in the response indicates that the pages are generated on the server side rather than the client side. Unfortunately, unless you find a way where you can browse and see all the results you want in one shot, you're going to have to iterate through each page. If you do manage to find a magic url, you can just process that one instead.
https://www.borsaitaliana.it/borsa/azioni/contratti.html?isin=IT0003128367&lang=en&page=10
I decided to give it a whirl to see what kind of performance I could get. Below is a complete script that iterates through the first 100 pages.
import pandas as pd
import requests
url = "https://www.borsaitaliana.it/borsa/azioni/contratti.html?isin=IT0003128367&lang=en&page="
df = pd.concat([
pd.read_html(requests.get(url + str(page)).content)[0]
for page in range(100)
])
df.to_csv('enel.csv', index=False)
Running it on my machine, it took 1.25 minutes for 100 pages.
$ time python scrape.py
real 1m16.914s
user 0m4.039s
sys 0m0.729s
This would be about 15 minutes per stock. I guess that's 7.5 hours for 30 stocks assuming they're all about the same length. You could run that overnight and it will be ready for you in the morning.