Search code examples
pythonpython-3.xwikipediawikipedia-api

Get all titles from Wikipedia with python


i need to get all titles from italian wikipedia. I wrote already this code:

import requests
  
 S = requests.Session()

    URL = "https://it.wikipedia.org/w/api.php"

    PARAMS = {
            "action": "query",
            "format": "json",
            "list": "allpages",
            "aplimit": "max",
        }
   
    R = S.get(url=URL, params=PARAMS)
    DATA = R.json()
    PAGES = DATA["query"]["allpages"]
    for page in PAGES:
        print(page['title'])

But this only prints me the first 500 titles. How can i get the rest of the titles?


Solution

  • I used your request and found the following:

    >>> DATA["continue"]
    {'apcontinue': "'Ndranghetista", 'continue': '-||'}
    

    And as per All pages Documentation:

    apcontinue: When more results are available, use this to continue.

    So to keep going do:

    full_data=[]
    full_data.extend(DATA["query"]["allpages"])
    
    while DATA["batchcomplete"] == "":
      PARAMS.update(DATA["continue"])
      R = S.get(url=URL, params=PARAMS)
      DATA = R.json()
    

    I'm not sure about the stopping condition on key "batchcomplete". Please double check as I didn't find an explanation on the wikipedia API page.