Search code examples
pythonweb-scrapingpython-requestshttp-status-code-404urllib

Page loads in a browser but gives 404 error in python requests library


I've seen similar questions but none of the solutions work for my case. I found a link that allows me to download csv file with the data in Tableau dashboard. When I open this link in a browser, it downloads the file automatically. But if I try to send a request, the response code is always 404. I guess the website is blocking the request. How can I bypass it?

UPD: it would be nice to have a solution without using selenium

import pandas as pd
import requests

url = 'https://public.tableau.com/app/profile/damian3851/viz/IATAdemandgrowth/IATAMonthlyCargoStatistics.csv'

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36'
}

page = requests.get(url, headers=headers)

Solution

  • You need to access these data via a different URL as follows:

    import requests
    
    url = "https://public.tableau.com/views/IATAdemandgrowth/IATAMonthlyCargoStatistics.csv"
    
    params = {
        ":showVizHome": "n",
    }
    
    with requests.get(url, params=params) as response:
        response.raise_for_status()
        print(*response.text.splitlines(), sep="\n")
    

    Output (partial):

    Latin America,March 2024,9.2
    Middle East,March 2024,19.9
    North America,March 2024,0.9
    Total Market,March 2024,10.3
    Africa,April 2024,10.6
    Asia Pacific,April 2024,14
    Europe,April 2024,12.7
    Latin America,April 2024,11.7
    Middle East,April 2024,9.4
    North America,April 2024,7
    Total Market,April 2024,11.1
    Africa,May 2024,18.4
    Asia Pacific,May 2024,17.8
    Europe,May 2024,17.2
    Latin America,May 2024,12.7
    Middle East,May 2024,15.3
    North America,May 2024,8.7
    Total Market,May 2024,14.7
    

    If you want a local CSV file then you should stream the content as follows:

    import requests
    
    url = "https://public.tableau.com/views/IATAdemandgrowth/IATAMonthlyCargoStatistics.csv"
    
    params = {
        ":showVizHome": "n",
    }
    
    with requests.get(url, params=params, stream=True) as response:
        response.raise_for_status()
        with open("IATAMonthlyCargoStatistics.csv", "wb") as output:
            for chunk in response.iter_content(4096):
                output.write(chunk)