i need to get all titles from italian wikipedia. I wrote already this code:
import requests
S = requests.Session()
URL = "https://it.wikipedia.org/w/api.php"
PARAMS = {
"action": "query",
"format": "json",
"list": "allpages",
"aplimit": "max",
}
R = S.get(url=URL, params=PARAMS)
DATA = R.json()
PAGES = DATA["query"]["allpages"]
for page in PAGES:
print(page['title'])
But this only prints me the first 500 titles. How can i get the rest of the titles?
I used your request and found the following:
>>> DATA["continue"]
{'apcontinue': "'Ndranghetista", 'continue': '-||'}
And as per All pages Documentation:
apcontinue: When more results are available, use this to continue.
So to keep going do:
full_data=[]
full_data.extend(DATA["query"]["allpages"])
while DATA["batchcomplete"] == "":
PARAMS.update(DATA["continue"])
R = S.get(url=URL, params=PARAMS)
DATA = R.json()
I'm not sure about the stopping condition on key "batchcomplete". Please double check as I didn't find an explanation on the wikipedia API page.