There is a large dataset, full of neatly-stored tabular data, found here that I would like to parse through and save locally.
The problem is, no matter how deep I "drill down" to inspect the source code, there isn't any actual data, nor any discernible source page.
My question is, is it therefore even possible to access the data via the typical requests.get()
and .content
etc.? Or would something like selenium
do the trick? If not these two options, then what?
Thanks in advance.
See my comment for what it's worth here's the request that should work but doesn't... For reasons I'm not sure, unless there's security at their end with regard to cookies.
Inspecting the page, it's making a POST request to c0cre127.caspio.com/dp/311a1000697d9171cc1c4128ae42
. Also what you get back is in a structured format. You can see it in the preview within the request in network tools. Interesting the 'responseText' which gives you the data, is all in html. So theoretically you could just parse this part of the data to grab what you need. The problem is when I recreate this HTTP request, the AppKey which part of the cookie needed according to the request, says it's wrong.
So selenium would work, not sure I can do much about the AppKey.
import requests
cookies = {
'cbParamList': '',
'cbCookieAccepted': '1',
'AppKey': '311a1000697d9171cc1c4128ae42',
'AWSALB': '76fnReAlqLZyJz4gNmSMnGc3oluXMlbsrGwaF+kcm4Rg8fklrjjrxvmez+XxXXg/yDle490fw/MKBNPWCyoGAiihFYgcWQ1RSp0vxSGJHDnfXncHSQuprTjv8Fjk',
'AWSALBCORS': '76fnReAlqLZyJz4gNmSMnGc3oluXMlbsrGwaF+kcm4Rg8fklrjjrxvmez+XxXXg/yDle490fw/MKBNPWCyoGAiihFYgcWQ1RSp0vxSGJHDnfXncHSQuprTjv8Fjk',
}
headers = {
'authority': 'c0cre127.caspio.com',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36',
'content-type': 'multipart/form-data; boundary=----WebKitFormBoundarykaIBnhjgBEZ0L714',
'accept': '*/*',
'origin': 'https://c0cre127.caspio.com',
'sec-fetch-site': 'same-origin',
'sec-fetch-mode': 'cors',
'sec-fetch-dest': 'empty',
'referer': 'https://c0cre127.caspio.com/dp/311a1000697d9171cc1c4128ae42',
'accept-language': 'en-US,en;q=0.9',
}
params = (
('rnd', '1596940878792'),
)
data = '$------WebKitFormBoundarykaIBnhjgBEZ0L714\\r\\nContent-Disposition: form-data; name="cbUniqueFormId"\\r\\n\\r\\n_69831fa53c178f\\r\\n------WebKitFormBoundarykaIBnhjgBEZ0L714\\r\\nContent-Disposition: form-data; name="ComparisonType1_1"\\r\\n\\r\\n\\r\\n------WebKitFormBoundarykaIBnhjgBEZ0L714\\r\\nContent-Disposition: form-data; name="MatchNull1_1"\\r\\n\\r\\nN\\r\\n------WebKitFormBoundarykaIBnhjgBEZ0L714\\r\\nContent-Disposition: form-data; name="FieldName2"\\r\\n\\r\\nDate\\r\\n------WebKitFormBoundarykaIBnhjgBEZ0L714\\r\\nContent-Disposition: form-data; name="Operator2"\\r\\n\\r\\nOR\\r\\n------WebKitFormBoundarykaIBnhjgBEZ0L714\\r\\nContent-Disposition: form-data; name="NumCriteriaDetails2"\\r\\n\\r\\n1\\r\\n------WebKitFormBoundarykaIBnhjgBEZ0L714\\r\\nContent-Disposition: form-data; name="ComparisonType2_1"\\r\\n\\r\\n=\\r\\n------WebKitFormBoundarykaIBnhjgBEZ0L714\\r\\nContent-Disposition: form-data; name="MatchNull2_1"\\r\\n\\r\\nN\\r\\n------WebKitFormBoundarykaIBnhjgBEZ0L714\\r\\nContent-Disposition: form-data; name="FieldName3"\\r\\n\\r\\n\\r\\n------WebKitFormBoundarykaIBnhjgBEZ0L714\\r\\nContent-Disposition: form-data; name="Operator3"\\r\\n\\r\\n\\r\\n------WebKitFormBoundarykaIBnhjgBEZ0L714\\r\\nContent-Disposition: form-data; name="NumCriteriaDetails3"\\r\\n\\r\\n1\\r\\n------WebKitFormBoundarykaIBnhjgBEZ0L714\\r\\nContent-Disposition: form-data; name="ComparisonType3_1"\\r\\n\\r\\n\\r\\n------WebKitFormBoundarykaIBnhjgBEZ0L714\\r\\nContent-Disposition: form-data; name="MatchNull3_1"\\r\\n\\r\\nN\\r\\n------WebKitFormBoundarykaIBnhjgBEZ0L714\\r\\nContent-Disposition: form-data; name="FieldName4"\\r\\n\\r\\nProperty\\r\\n------WebKitFormBoundarykaIBnhjgBEZ0L714\\r\\nContent-Disposition: form-data; name="Operator4"\\r\\n\\r\\nOR\\r\\n------WebKitFormBoundarykaIBnhjgBEZ0L714\\r\\nContent-Disposition: form-data; name="NumCriteriaDetails4"\\r\\n\\r\\n1\\r\\n------WebKitFormBoundarykaIBnhjgBEZ0L714\\r\\nContent-Disposition: form-data; name="ComparisonType4_1"\\r\\n\\r\\n=\\r\\n------WebKitFormBoundarykaIBnhjgBEZ0L714\\r\\nContent-Disposition: form-data; name="MatchNull4_1"\\r\\n\\r\\nN\\r\\n------WebKitFormBoundarykaIBnhjgBEZ0L714\\r\\nContent-Disposition: form-data; name="FieldName5"\\r\\n\\r\\n\\r\\n------WebKitFormBoundarykaIBnhjgBEZ0L714\\r\\nContent-Disposition: form-data; name="Operator5"\\r\\n\\r\\n\\r\\n------WebKitFormBoundarykaIBnhjgBEZ0L714\\r\\nContent-Disposition: form-data; name="NumCriteriaDetails5"\\r\\n\\r\\n1\\r\\n------WebKitFormBoundarykaIBnhjgBEZ0L714\\r\\nContent-Disposition: form-data; name="ComparisonType5_1"\\r\\n\\r\\n\\r\\n------WebKitFormBoundarykaIBnhjgBEZ0L714\\r\\nContent-Disposition: form-data; name="MatchNull5_1"\\r\\n\\r\\nN\\r\\n------WebKitFormBoundarykaIBnhjgBEZ0L714\\r\\nContent-Disposition: form-data; name="FieldName6"\\r\\n\\r\\nZone\\r\\n------WebKitFormBoundarykaIBnhjgBEZ0L714\\r\\nContent-Disposition: form-data; name="Operator6"\\r\\n\\r\\nOR\\r\\n------WebKitFormBoundarykaIBnhjgBEZ0L714\\r\\nContent-Disposition: form-data; name="NumCriteriaDetails6"\\r\\n\\r\\n1\\r\\n------WebKitFormBoundarykaIBnhjgBEZ0L714\\r\\nContent-Disposition: form-data; name="ComparisonType6_1"\\r\\n\\r\\n=\\r\\n------WebKitFormBoundarykaIBnhjgBEZ0L714\\r\\nContent-Disposition: form-data; name="MatchNull6_1"\\r\\n\\r\\nN\\r\\n------WebKitFormBoundarykaIBnhjgBEZ0L714\\r\\nContent-Disposition: form-data; name="AppKey"\\r\\n\\r\\n311a1000697d9171cc1c4128ae42\\r\\n------WebKitFormBoundarykaIBnhjgBEZ0L714\\r\\nContent-Disposition: form-data; name="PrevPageID"\\r\\n\\r\\n1\\r\\n------WebKitFormBoundarykaIBnhjgBEZ0L714\\r\\nContent-Disposition: form-data; name="cbPageType"\\r\\n\\r\\nSearch\\r\\n------WebKitFormBoundarykaIBnhjgBEZ0L714\\r\\nContent-Disposition: form-data; name="PageID"\\r\\n\\r\\n2\\r\\n------WebKitFormBoundarykaIBnhjgBEZ0L714\\r\\nContent-Disposition: form-data; name="GlobalOperator"\\r\\n\\r\\nAND\\r\\n------WebKitFormBoundarykaIBnhjgBEZ0L714\\r\\nContent-Disposition: form-data; name="NumCriteria"\\r\\n\\r\\n6\\r\\n------WebKitFormBoundarykaIBnhjgBEZ0L714\\r\\nContent-Disposition: form-data; name="Search"\\r\\n\\r\\n1\\r\\n------WebKitFormBoundarykaIBnhjgBEZ0L714\\r\\nContent-Disposition: form-data; name="Value2_1"\\r\\n\\r\\n04/05/2020\\r\\n------WebKitFormBoundarykaIBnhjgBEZ0L714\\r\\nContent-Disposition: form-data; name="Value4_1"\\r\\n\\r\\nAtterbury\\r\\n------WebKitFormBoundarykaIBnhjgBEZ0L714\\r\\nContent-Disposition: form-data; name="Value6_1"\\r\\n\\r\\nCentral\\r\\n------WebKitFormBoundarykaIBnhjgBEZ0L714\\r\\nContent-Disposition: form-data; name="ClientQueryString"\\r\\n\\r\\n\\r\\n------WebKitFormBoundarykaIBnhjgBEZ0L714\\r\\nContent-Disposition: form-data; name="AjaxAction"\\r\\n\\r\\nSearchForm\\r\\n------WebKitFormBoundarykaIBnhjgBEZ0L714\\r\\nContent-Disposition: form-data; name="GridMode"\\r\\n\\r\\nFalse\\r\\n------WebKitFormBoundarykaIBnhjgBEZ0L714\\r\\nContent-Disposition: form-data; name="cbUniqueFormId"\\r\\n\\r\\n_69831fa53c178f\\r\\n------WebKitFormBoundarykaIBnhjgBEZ0L714\\r\\nContent-Disposition: form-data; name="AjaxActionHostName"\\r\\n\\r\\nhttps://c0cre127.caspio.com\\r\\n------WebKitFormBoundarykaIBnhjgBEZ0L714\\r\\nContent-Disposition: form-data; name="cbAjaxReferrer"\\r\\n\\r\\nhttps://c0cre127.caspio.com/dp/311a1000697d9171cc1c4128ae42\\r\\n------WebKitFormBoundarykaIBnhjgBEZ0L714--\\r\\n'
response = requests.post('https://c0cre127.caspio.com/dp/311a1000697d9171cc1c4128ae42', headers=headers, params=params, cookies=cookies, data=data)
'Undefined AppKey. (<a href="http://www.caspio.com/l/default.ashx?s=157">Caspio Bridge</a> error) (60011)'