Search code examples
pythonbeautifulsouppython-requestsuser-agent

Requests is unable to get page


I am trying to retrieve this page using Beautiful Soup:

This is the code that I tried:

import requests
from bs4 import BeautifulSoup

page = requests.get("https://www.nasdaq.com/market-activity/stocks/msft/news-headlines")

Every time I run my code, it gets stuck and is unable to retrieve the page. However, I received a ReadTimeout exception once (requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='www.nasdaq.com', port=443): Read timed out. (read timeout=None)).

Any help or fix for this problem will be truly appreciated.


Solution

  • I included headers in my request and it seemed to work. I used the same headers that my browser sends, which you can find using the developer tools (as indicated here).

    import requests
    
    headers = {
        "authority": "www.nasdaq.com",
        "method": "GET",
        "path": "/market-activity/stocks/msft/news-headlines",
        "scheme": "https",
        "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
        "accept-encoding": "gzip, deflate, br",
        "accept-language": "en-CA,en;q=0.9,ro-RO;q=0.8,ro;q=0.7,en-GB;q=0.6,en-US;q=0.5",
        "cache-control": "max-age=0",
        "dnt": "1",
        "if-modified-since": "Tue, 30 Jun 2020 19:43:05 GMT",
        "if-none-match": "1593546185",
        "sec-fetch-dest": "document",
        "sec-fetch-mode": "navigate",
        "sec-fetch-site": "none",
        "sec-fetch-user": "?1",
        "upgrade-insecure-requests": "1",
        "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36"
    }
    
    page = requests.get("https://www.nasdaq.com/market-activity/stocks/msft/news-headlines", headers=headers)