Search code examples
pythonweb-scrapingpython-requestshtml-parsinghidden

How to Get data-* attributes when web scraping using python requests (Python Requests Creating Some Issues)


How can I get the value of data-d1-value when I am using requests library of python?

The request.get(URL) function is itself not giving the data-* attributes in the div which are present in the original webpage.

The web page is as follows:

<div id="test1" class="class1" data-d1-value="150">
180
</div>

The code I am using is :

req = request.get(url)
soup = BeautifulSoup(req.text, 'lxml')
d1_value = soup.find('div', {'class':"class1"})
print(d1_value)

The result I get is:

<div id="test1" class="class1">
180
</div>

When I debug this, I found that request.get(URL) is not returning the full div but only the id and class and not data-* attributes.

How should I modify to get the full value?

For better example: For my case the URL is: https://www.moneycontrol.com/india/stockpricequote/oil-drillingexploration/oilnaturalgascorporation/ONG

And the Information of variable: The DIV CLASS is : class="inprice1 nsecp" and The value of data-numberanimate-value is what I am trying to fetch

Thanks in advance :)


Solution

  • EDIT

    Website response differs in case of requesting it - In your case using requests the value you are looking for is served in this way:

    <div class="inprice1 nsecp" id="nsecp" rel="92.75">92.75</div>
    

    So you can get it from the rel or from the text:

    soup.find('div', {'class':"inprice1"})['rel']
    soup.find('div', {'class':"inprice1"}).get_text()
    

    Example

    import requests
    from bs4 import BeautifulSoup
    
    req = requests.get('https://www.moneycontrol.com/india/stockpricequote/oil-drillingexploration/oilnaturalgascorporation/ONG')
    
    soup = BeautifulSoup(req.text, 'lxml')
    
    print('rel: '+soup.find('div', {'class':"inprice1"})['rel'])
    print('text :'+soup.find('div', {'class':"inprice1"}).get_text())
    

    Output

    rel: 92.75
    text: 92.75
    

    To get a response that display the source as you inspect it, you have to try selenium

    Example

    from selenium import webdriver
    from bs4 import BeautifulSoup
    from time import sleep
    
    driver = webdriver.Chrome(executable_path='C:\Program Files\ChromeDriver\chromedriver.exe')
    url = "https://www.moneycontrol.com/india/stockpricequote/oil-drillingexploration/oilnaturalgascorporation/ONG"
    
    driver.get(url)
    sleep(2)
    
    soup = BeautifulSoup(driver.page_source, "lxml")
    print(soup.find('div', class_='inprice1 nsecp')['data-numberanimate-value'])
    driver.close()
    

    To get the attribute value just add ['data-d1-value'] to your find()

    Example

    from bs4 import BeautifulSoup
    
    html='''
    <div id="test1" class="class1" data-d1-value="150">
    180
    </div>
    '''
    
    soup = BeautifulSoup(html, 'lxml')
    d1_value = soup.find('div', {'class':"class1"})['data-d1-value']
    print(d1_value)