Search code examples
google-sheetsxpathimportgoogle-sheets-formulagoogle-query-language

Google Sheets - Scrape table involved with pagination


I'm trying to find a work around with google sheets. I'm pulling data from finviz.com to build out custom stock screeners, but the only issue is they make use of pagination, therefore only allowing 20 rows for the first few results. I've checked that if I click on the 2nd page results in the pagination section of the table, only the URL changes, indicating the first row of the new table. Meaning if my first result page would be 20 rows, the second result page URL would have a parameter like this "r=21" indicating the first row of the second page results. Now how would I go about this to ensure that I'm pulling all the data once pagination of the table is in place? Also, checking the source of the page, these new parameters are stored into href's, meaning if our pagination had 3 pages as results, then within the <table/> elements we can see the new urls in href's, for example:

<table>
  <a href="screener.ashx?v=111&f=targetprice_a5&r=21"/>
  <a href="screener.ashx?v=111&f=targetprice_a5&r=41"/>
  <a href="screener.ashx?v=111&f=targetprice_a5&r=61"/>
</table>

Take note only one new parameter is added to the url "r=21", the rest are consistant throughout different result pages.

Is this even possible with google sheets?

Here's what I have. The goal to this idea is to build out stock market screeners that are updated every 3 mins, which allows an integration/view from notion.

=QUERY(IMPORTHTML("https://finviz.com/screener.ashx?v=111&f=cap_smallover,earningsdate_thismonth,fa_epsqoq_o15,fa_grossmargin_o20,sh_avgvol_o750,sh_curvol_o1000,ta_perf_52w10o,ta_rsi_nob50&ft=4&o=perfytd&ar=180","Table","19"),"SELECT Col1,Col2,Col7,Col8,Col9,Col10,Col11")

Solution

  • try:

    =QUERY({
     IMPORTHTML("https://finviz.com/screener.ashx?v=111&f=cap_smallover,earningsdate_thismonth,fa_epsqoq_o15,fa_grossmargin_o20,sh_avgvol_o750,sh_curvol_o1000,ta_perf_52w10o,ta_rsi_nob50&ft=4&o=perfytd&ar=180","Table","19");
     IMPORTHTML("https://finviz.com/screener.ashx?v=111&f=cap_smallover,earningsdate_thismonth,fa_epsqoq_o15,fa_grossmargin_o20,sh_avgvol_o750,sh_curvol_o1000,ta_perf_52w10o,ta_rsi_nob50&ft=4&o=perfytd&r=21&ar=180","Table","19");
     IMPORTHTML("https://finviz.com/screener.ashx?v=111&f=cap_smallover,earningsdate_thismonth,fa_epsqoq_o15,fa_grossmargin_o20,sh_avgvol_o750,sh_curvol_o1000,ta_perf_52w10o,ta_rsi_nob50&ft=4&o=perfytd&r=41&ar=180","Table","19");
     IMPORTHTML("https://finviz.com/screener.ashx?v=111&f=cap_smallover,earningsdate_thismonth,fa_epsqoq_o15,fa_grossmargin_o20,sh_avgvol_o750,sh_curvol_o1000,ta_perf_52w10o,ta_rsi_nob50&ft=4&o=perfytd&r=61&ar=180","Table","19")},
     "select Col1,Col2,Col7,Col8,Col9,Col10,Col11 where Col1 matches '\d+'", 1)
    

    enter image description here