Search code examples
pythonpandasweb-scrapingbeautifulsouppython-requests

How to scrape website which has hidden data inside table?


I am trying to Scrape Screener.in website to extract some information related to stocks. However while trying to extract Quarterly Results section there are some field which is hidden and when click on + button it show additional information related to parent header. I need to have this information

I am using below python code which is giving me a dataframe but without additional information

url = f'https://www.screener.in/company/TATAPOWER/consolidated/'
print(url)
req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
page = urlopen(req).read()
soup = BeautifulSoup(page, 'html.parser')
table = soup.find_all("table", {"class": "data-table responsive-text-nowrap"})[0]
df = pd.read_html(StringIO(str(table)))[0]
df

Above code is working fine however I am not able to pull additional information

Can somebody help me with this?


Solution

  • As already commented, the content is reloaded on demand, but it is precisely these requests that can be replicated in order to obtain the content as well.

    To do this, you have to iterate over the rows of the table and make the request if necessary.

    import requests
    import pandas as pd
    from bs4 import BeautifulSoup
    
    url = f'https://www.screener.in/company/TATAPOWER/consolidated/'
    soup = BeautifulSoup(requests.get(url, headers={'User-Agent': 'Mozilla/5.0'}).text)
    
    keys = ['Item'] + list(soup.select_one('#quarters thead tr').stripped_strings)
    
    data = []
    
    for row in soup.select('#quarters tbody tr')[:-1]:
        if row.td.button:
            data.append(dict(zip(keys,[c.text for c in row.select('td')])))
            d = requests.get(f'https://www.screener.in/api/company/3371/schedules/?parent={row.td.button.text.strip(" +")}&section=quarters&consolidated=', headers={'User-Agent': 'Mozilla/5.0'}).json()
            first_key = next(iter(d))
            data.append({"Item": first_key, **d[first_key]})     
        else:
            data.append(dict(zip(keys,row.stripped_strings)))
    
    pd.DataFrame(data)
    

    Result:

    Item Dec 2021 Mar 2022 Jun 2022 Sep 2022 Dec 2022 Mar 2023 Jun 2023 Sep 2023 Dec 2023 Mar 2024 Jun 2024 Sep 2024 Dec 2024
    Sales + 10,913 11,960 14,495 14,031 14,129 12,454 15,213 15,738 14,651 15,847 17,294 15,698 15,391
    YOY Sales Growth % 43.63% 15.41% 43.06% 43.02% 29.47% 4.13% 4.95% 12.17% 3.69% 27.24% 13.67% -0.26% 5.05%
    Expenses + 9,279 10,091 12,812 12,270 11,810 10,526 12,500 12,967 12,234 13,540 14,232 12,427 12,312
    Material Cost % 8.67% 13.38% 6.74% 4.04% 6.55% 12.13% 6.00% 6.09% 9.29% 13.86% 5.50% 3.59% 6.75%
    Operating Profit 1,634 1,869 1,683 1,760 2,319 1,928 2,713 2,771 2,417 2,307 3,062 3,271 3,079
    OPM % 15% 16% 12% 13% 16% 15% 18% 18% 16% 15% 18% 21% 20%
    Other Income + 865 62 1,227 1,502 1,497 1,352 877 567 1,092 1,407 578 632 589
    Exceptional items 0 -618 0 0 0 0 235 0 0 39 0 -140 0
    Interest 953 1,015 1,026 1,052 1,098 1,196 1,221 1,182 1,094 1,136 1,176 1,143 1,170
    Depreciation 758 846 822 838 853 926 893 926 926 1,041 973 987 1,041
    Profit before tax 788 71 1,062 1,373 1,864 1,158 1,476 1,231 1,489 1,537 1,490 1,773 1,457
    Tax % 30% -794% 17% 32% 44% 19% 23% 17% 28% 32% 20% 38% 18%
    Net Profit + 552 632 884 935 1,052 939 1,141 1,017 1,076 1,046 1,189 1,093 1,188
    Profit after tax 552 632 884 935 1,052 939 1,141 1,017 1,076 1,046 1,189 1,093 1,188
    EPS in Rs 1.33 1.57 2.49 2.56 2.96 2.43 3.04 2.74 2.98 2.80 3.04 2.90 3.23