Search code examples
python-3.xbeautifulsoupyahoo-finance

Why does yahoo finance data only update when I use header while scraping?


So, I've recently learnt BeautifulSoup and decided to scrape stock data from yahoo finance as an exercise.

This code right here only returns static prices of the stock, which is not updating

import requests
from bs4 import BeautifulSoup

def priceTracker():
    ticker = 'TSLA'
    url = f'https://finance.yahoo.com/quote/{ticker}?p={ticker}&.tsrc=fin-srch'
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'lxml')
    price = soup.find_all('div', {'class':'My(6px) Pos(r) smartphone_Mt(6px)'})[0].find('span').text
    return(price)

while True:
    print(priceTracker())

I found a solution online, where people included a "header" argument in requests.get() in line 8, and it worked.

import requests
from bs4 import BeautifulSoup

def priceTracker():
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:75.0) Gecko/20100101 Firefox/75.0'}
    ticker = 'TSLA'
    url = f'https://finance.yahoo.com/quote/{ticker}?p={ticker}&.tsrc=fin-srch'
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.text, 'lxml')
    price = soup.find_all('div', {'class':'My(6px) Pos(r) smartphone_Mt(6px)'})[0].find('span').text
    return(price)

while True:
    print(priceTracker())

My question is, why do the scraped prices on yahoo finance only update when the "header" is included? I don't understand why it behaves like that.


Solution

  • HTTP headers let the client and the server pass additional information with an HTTP request or response.

     headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:75.0) Gecko/20100101 Firefox/75.0'}
        ticker = 'TSLA'
        url = f'https://finance.yahoo.com/quote/{ticker}?p={ticker}&.tsrc=fin-srch'
        response = requests.get(url, headers=headers)
    

    Some sites require 'User-Agent' to be included as additional information in the header to access.