I tried to extract a search result from this page: "http://std.stheadline.com/daily/formerly.php". While selecting on webpage 20-Nov to 22-Nov and checking the "財經" news category check box, gives 47 results. However, my python php codes with parameters obtained from Chrome Inspect, yield 162 results. It seems the sever did not recognize my code parameters and given me the results of ALL news categories of the latest date.
I used this codes: import pandas as pd
url= "http://std.stheadline.com/daily/ajax/ajaxFormerly.php?startDate=2019-11-20&endDate=2019-11-22&type%5B%5D=15&keyword="
df = pd.read_json(url)
print(df.info(verbose=True))
print(df)
also tried:
url= "http://std.stheadline.com/daily/ajax/ajaxFormerly.php?startDate=2019-11-20&endDate=2019-11-22&type=15&keyword="
It uses POST
request which sends parameters in body, not in url. You can't send parameters in url. You may use module requests
(or urllib
) to send POST
requests
import requests
url = 'http://std.stheadline.com/daily/ajax/ajaxFormerly.php'
params = {
'startDate': '2019-11-20',
'endDate': '2019-11-22',
'type[]': '15',
'keyword': '',
}
r = requests.post(url, data=params)
data = r.json()
print(data['totalCount']) # 47
To load it to DataFrame
you may have to use io.StringIO
to create file in memory.
import requests
import pandas as pd
import io
url = 'http://std.stheadline.com/daily/ajax/ajaxFormerly.php'
params = {
'startDate': '2019-11-20',
'endDate': '2019-11-22',
'type[]': '15',
'keyword': '',
}
r = requests.post(url, data=params)
f = io.StringIO(r.text)
df = pd.read_json(f)
print(df)