Search code examples
pythonweb-scrapingbeautifulsouphtml-table

How to extract hidden input text on javascript table when scraping a website with python and Beautifulsoup


I'm trying to get the information of the table for this public website. It results when i run the code i get the information but i'm not able to extract some hidden input, where the data and values are stored.

This is my code:

url = 'https://sigaf.transmuni.gob.ni/cgi-bin/tm_CPTechosMun.cgi?ejercicio=2022'
r = requests.get(url, verify=False)
soup = bs(r.text, 'html.parser')
table2022trans = soup.find('table', attrs={"id": "DESEMBOLSO"})

for rows in table2022trans.find_all('tbody'):
    munis = rows.find_all('tr')
    for muni in munis:
        data = muni.find_all('td')
        for dat in data:
            row = data.find('input')
            print(row)

until here, i get the table but i'm not able to extract these inputs text. Do you have any advice on this situation.

Thank you!!!

I'm trying to extract the information of the website table but it has some hidden input text than i can't reach. The table is generated on Javascript, according with the inspection. I think is not generated by an API.


Solution

  • When I try to access the site, I encounter the problem of missing SSL. To work around this condition, I used requests.packages.urllib3.disable_warnings(). And if I understand the question correctly, then the hidden text will help you get the following code:

    import requests
    from bs4 import BeautifulSoup as bs
    
    
    url = 'https://sigaf.transmuni.gob.ni/cgi-bin/tm_CPTechosMun.cgi?ejercicio=2022'
    
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
    }
    
    requests.packages.urllib3.disable_warnings()
    r = requests.get(url, headers=headers, verify=False)
    soup = bs(r.text, 'html.parser')
    table2022trans = soup.find('table', attrs={"id": "DESEMBOLSO"})
    rows = table2022trans.find_all('input', type="HIDDEN")
    for row in rows:
        row = row.get('value')
        print(row)