I use following code to scrapy the web "http://gs.amac.org.cn:10080/amac-infodisc/res/pof/manager/index.html". To scrapy the web, I post the data using json format. It works ok to response json content. The strange thing is it always response the same content, no matter what "page" number it is or what "size" it is.So anyone interested in this question can try to change the "page" number in "postdata" and to see the same "id" responsed.
import urllib2
import urllib
import json
import random
headers = {"User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 UBrowser/5.6.10551.6 Safari/537.36",
"Content-Type": "application/json"}
# http://gs.amac.org.cn:10080/amac-infodisc/res/pof/manager/index.html
# change the "page" number here, response the same "id"
postdata = {"rand":random.random(),"page":10,"size":20}
url = "http://gs.amac.org.cn:10080/amac-infodisc/api/pof/manager"
postdata = json.dumps(postdata)
req = urllib2.Request(url,data=postdata,headers=headers)
response = json.load(urllib2.urlopen(req,timeout=30))
print response["content"][0]["id"]
The problem is that the arguments to the page are not sent as post data, but rather as query arguments:
Changing the argument type fixes the problem:
import requests
import random
page = 1
size = 20
rand = random.random()
url = 'http://gs.amac.org.cn:10080/amac-infodisc/api/pof/manager?rand={}&page={}&size={}'.format(
random, page, size
)
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 UBrowser/5.6.10551.6 Safari/537.36",
"Content-Type": "application/json"
}
print(requests.post(url, json={}, headers=headers).json()['content'][0]['id'])
This prints 101000000409
(101000000138
for page 0).