Search code examples
pythonpython-requestsurllib2

request and urllib2 get error from XBRL page. 'The browser mode you are running is not compatible with this application'


Not sure why I can't get the page from this link. All I want to do is get it and feed into beautifulsoup.

import requests,urllib2

link='https://www.sec.gov/ix?doc=/Archives/edgar/data/1373715/000137371518000157/now-2018630x10q.htm'

r = requests.get(link)

r2=urllib2.urlopen(link)
html=r2.read()

also tried faking a browser with:

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}

r = requests.get(link, headers=headers)

Text is the same... not the page I want.

Getting a header that looks like this

var note = 'The browser mode you are running is not compatible with this application.';

            browserName ='Microsoft Internet Explorer';

            note +='You are currently running '+browserName+' '+((ie7>0)?7:8)+'.0.';       

                var userAgent = window.navigator.userAgent.toLowerCase();           

                if(userAgent.indexOf('ipad') != -1 || userAgent.indexOf('iphone') != -1 || userAgent.indexOf('apple') != -1){               

                    note += ' Please use a more current version of '+browserName+' in order to use the application.';

                }else if(userAgent.indexOf('android') != -1){               

                    note += ' Please use a more current version of Google Chrome or Mozilla Firefox in order to use the application.';

                }else{              

                    note += ' Please use a more current version of Microsoft Internet Explorer, Google Chrome or Mozilla Firefox in order to use the application.';

                }

I can get this page fine: https://www.sec.gov/Archives/edgar/data/1373715/000137371518000153/erq2fy18-document.htm

which is not a XBRL document. I think it has something to do with the XBRL and the server wants my browser to interact with the data?


Solution

  • It seems that this part of the page is rendered by js. Usually the most reliable option for dynamic content is selenium, but in this case you can avoid it and use requests.

    It is obvious that the page uses the contents of this document /Archives/edgar/data/1373715/000137371518000157/now-2018630x10q.htm. You can bypass that page and request the document directly.

    import requests
    
    url = "https://www.sec.gov/Archives/edgar/data/1373715/000137371518000157/now-2018630x10q.htm"
    r = requests.get(url)
    html = r.text
    
    print(html)