Search code examples
pythonbeautifulsoupbitcoin

Using Beautiful soup to get the stock prices


I try to make a simple price tracker for bitcoin and other cryptocurrencies or stocks. I intend to use web scraping to get prices from google finance relying on BeautifulSoup and requests libraries.

The code is this:

 from bs4 import BeautifulSoup
import requests
import time

def getprice():
    url = 'https://www.google.com/search?q=bitcoin+price'
    HTML = requests.get(url)
    soup = BeautifulSoup(HTML.text, 'html.parser')
    text = soup.find('div', attrs={'class':'BNeawe iBp4i AP7Wnd'}).find("div", attrs={'class':'BNeawe iBp4i AP7Wnd'}).text
    return text

if __name__ == "__main__":
    bitcoin = getprice()
    print(bitcoin)

I get this error

     File "c:\Users\gabri\Visual Studio\crypto\bitcoinprice.py", line 19, in <module>
    bitcoin = getprice()
  File "c:\Users\gabri\Visual Studio\crypto\bitcoinprice.py", line 15, in getprice
    text = soup.find('div', attrs={'class':'BNeawe iBp4i AP7Wnd'}).find("div", attrs={'class':'BNeawe iBp4i AP7Wnd'}).text
AttributeError: 'NoneType' object has no attribute 'find'

How can I solve it?


Solution

  • The reason this doesn't work is because you will run into the following problems:

    1. You will be hitting Google's bot detection, which means when you do requests.get you won't get back the Google results, instead you'll get a response from the bot detection asking you to tick a box to confirm you are human.
    2. The class you are searching for doesn't exist.
    3. You are using the default html.parser which is going to be useless as Google does not put the price data in the raw HTML code. Instead you want to use something more advanced like the lxml parser.

    Based on what you are trying to do, you could try to trick Google's bot detection by making your request seem more legitimate, for example add in the user agent that a Chrome browser would normally send. Additionally, to get the price it seems like you want the pclqee class in a span element.

    Try this instead:

    First install the lxml parser: pip3 install lxml

    Then use the below snippet instead:

    from bs4 import BeautifulSoup
    import requests
    import time
    
    
    def getprice():
        url = "https://www.google.com/search?q=bitcoin+price"
        HTML = requests.get(
            url,
            headers={
                "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.74 Safari/537.36"
            },
        )
        soup = BeautifulSoup(HTML.text, "lxml")
        text = soup.find("span", attrs={"class": "pclqee"}).text
        return text
    
    
    if __name__ == "__main__":
        bitcoin = getprice()
        print(bitcoin)
    

    Although the above modified snippet will work, I wouldn't advise using it. Google will still be able to detect your request as a bot occassionally and so this code would be unreliable.

    If you want stock data I suggest you try to web scrape some API's directly or use API's that do that for you already, e.g. have a look at https://www.alphavantage.co/