Search code examples
pythonedgarsec

JSONDecodeError: Expecting value: line 1 column 1 (char 0) when scaping SEC EDGAR


My codes are as follows:

import requests
import urllib
from bs4 import BeautifulSoup

year_url = r"https://www.sec.gov/Archives/edgar/daily-index/2020/index.json"
year_content = requests.get(year_url)
decoded_year_url = year_content.json()

I could run the exactly same codes last year, but when I ran it yesterday, the warning popped up: "JSONDecodeError: Expecting value: line 1 column 1 (char 0)" Why? How should I solve the problem? Thanks a lot!


Solution

  • Apparently the SEC has added rate-limiting to their website, according to this GitHub issue from May 2021. The reason why you're receiving the error message is that the response contains HTML, rather than JSON, which causes requests to raise an error upon calling .json().

    To resolve this, you need to add the User-agent header to your request. I can access the JSON with the following:

    import requests
    import urllib
    from bs4 import BeautifulSoup
    
    year_url = r"https://www.sec.gov/Archives/edgar/daily-index/2020/index.json"
    year_content = requests.get(year_url, headers={'User-agent': '[specify user agent here]'})
    decoded_year_url = year_content.json()