Search code examples
phppythonajaxmismatch

Pythonon ajax php prase result is different from on screen result?


I tried to extract a search result from this page: "http://std.stheadline.com/daily/formerly.php". While selecting on webpage 20-Nov to 22-Nov and checking the "財經" news category check box, gives 47 results. However, my python php codes with parameters obtained from Chrome Inspect, yield 162 results. It seems the sever did not recognize my code parameters and given me the results of ALL news categories of the latest date.

I used this codes: import pandas as pd

url=  "http://std.stheadline.com/daily/ajax/ajaxFormerly.php?startDate=2019-11-20&endDate=2019-11-22&type%5B%5D=15&keyword="

df = pd.read_json(url)
print(df.info(verbose=True))
print(df)

also tried:

url=  "http://std.stheadline.com/daily/ajax/ajaxFormerly.php?startDate=2019-11-20&endDate=2019-11-22&type=15&keyword="

Solution

  • It uses POST request which sends parameters in body, not in url. You can't send parameters in url. You may use module requests (or urllib) to send POST requests

    import requests
    
    url = 'http://std.stheadline.com/daily/ajax/ajaxFormerly.php'
    
    params = {
        'startDate': '2019-11-20',
        'endDate': '2019-11-22',
        'type[]': '15',
        'keyword': '',
    }
    
    r = requests.post(url, data=params)
    
    data = r.json()
    
    print(data['totalCount']) # 47
    

    To load it to DataFrame you may have to use io.StringIO to create file in memory.

    import requests
    import pandas as pd
    import io
    
    url = 'http://std.stheadline.com/daily/ajax/ajaxFormerly.php'
    
    params = {
        'startDate': '2019-11-20',
        'endDate': '2019-11-22',
        'type[]': '15',
        'keyword': '',
    }
    
    r = requests.post(url, data=params)
    
    f = io.StringIO(r.text)
    df = pd.read_json(f)
    
    print(df)