Search code examples
pythonarraysjsonhttp-status-code-403gspread

HTPP Error 403 while pulling JSON feed when I definitely have access


Confused as to why I'm seeing the 403 Forbidden error when I have access. I tested out the API, I can print to terminal no problem with this script:

import requests as rq
from bs4 import BeautifulSoup
# import urllib3
# import certifi
# # urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

# http = urllib3.PoolManager(
#     cert_reqs='CERT_REQUIRED',
#     ca_certs=certifi.where())


url = 'https://api.chucknorris.io/jokes/random'

req = rq.get(url, verify=False)

soup = BeautifulSoup(req.text, 'html.parser')


print(soup)

But when I try to print to my Google sheet with the script below, I run into an issue:

# import urllib library
import json
from urllib.request import urlopen

import gspread


gc = gspread.service_account(filename='creds.json')
sh = gc.open_by_key('1-1aiGMn2yUWRlh_jnIebcMNs-6phzUNxkktAFH7uY9o')
worksheet = sh.sheet1


url = 'https://api.chucknorris.io/jokes/random'


# store the response of URL
response = urlopen(url)

# storing the JSON response
# from url in data
data_json = json.loads(response.read())

# print the json response
# print(data_json)
result = []
for key in data_json:
    result.append([key, data_json[key] if not isinstance(
        data_json[key], list) else ",".join(map(str, data_json[key]))])
worksheet.update('a1', result)

I tested out the G Sheet, and pushed data there no problem. The issue is coming from the JSON feed, not the connection through Gspread.

Any thoughts on how I could get this going directly into my google sheet?

Full error:

Traceback (most recent call last):

  File "c:\Users\AMadle\NBA-JSON-Fetch\PrintToSheetTest.py", line 17, in <module>
    response = urlopen(url)
  File "C:\Python\python3.10.5\lib\urllib\request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Python\python3.10.5\lib\urllib\request.py", line 525, in open
    response = meth(req, response)
  File "C:\Python\python3.10.5\lib\urllib\request.py", line 634, in http_response
    response = self.parent.error(
  File "C:\Python\python3.10.5\lib\urllib\request.py", line 563, in error
    return self._call_chain(*args)
  File "C:\Python\python3.10.5\lib\urllib\request.py", line 496, in _call_chain
    result = func(*args)
  File "C:\Python\python3.10.5\lib\urllib\request.py", line 643, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

Solution

  • In your situation, I thought that when "User-Agent" is set, your issue might be able to be removed. So, in your script, how about the following modification?

    From:

    response = urlopen(url)
    

    To:

    response = urlopen(Request(url, headers={"User-Agent": ""}))
    

    Or, please include a sample value of "User-Agent" as follows.

    response = urlopen(Request(url, headers={"User-Agent": "###sample user agent###"}))
    
    • In this case, please modify from urllib.request import urlopen to from urllib.request import urlopen, Request.