Search code examples
pythongmail-api

How to write Python to get all Gmail message ID's?


I would like to list all message ID's from a Gmail account utilizing the Gmail API. So far I've been able to list the first and second page of message ID's. I know I have to use the pageToken to get to the next page of results, but I can't figure out how to restructure my code so I'm not using 1,2,3, etc variables to call each page. Source code is below.

get_email_ids.py:

from __future__ import print_function
import os.path
from collections import Counter
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
from google.oauth2.credentials import Credentials

# If modifying these scopes, delete the file token.json.
SCOPES = ['https://www.googleapis.com/auth/gmail.readonly']

def main():
    """Shows basic usage of the Gmail API.
    """
    creds = None
    user_id = "me"
    # The file token.json stores the user's access and refresh tokens, and is
    # created automatically when the authorization flow completes for the first
    # time.
    if os.path.exists('token.json'):
        creds = Credentials.from_authorized_user_file('token.json', SCOPES)
    # If there are no (valid) credentials available, let the user log in.
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(
                'credentials.json', SCOPES)
            creds = flow.run_local_server(port=0)
        # Save the credentials for the next run
        with open('token.json', 'w') as token:
            token.write(creds.to_json())

    service = build('gmail', 'v1', credentials=creds)

    ### Call the Gmail API

    ### Show messages

    token = ''
    messages = service.users().messages().list(userId=user_id,pageToken=token).execute().get('messages', [])
    token = service.users().messages().list(userId=user_id,pageToken=token).execute().get('nextPageToken', [])
    print(messages,token)

    messages2 = service.users().messages().list(userId=user_id,pageToken=token).execute().get('messages', [])
    token2 = service.users().messages().list(userId=user_id,pageToken=token).execute().get('nextPageToken', [])
    print(messages2,token2) 


if __name__ == '__main__':
    main()

Results of get_email_ids.py (shortened):

[{'id': '179ed5ae720de1f6', 'threadId': '179ed5ae720de1f6'}, ... {'id': '179ba226644a079a', 'threadId': '17972318184138fa'}] 09573475999783117733
[{'id': '179b9f8852d3b09d', 'threadId': '179b9f8852d3b09d'}, ... {'id': '1797fa390caa3454', 'threadId': '1797fa390caa3454'}] 07601624978802434502

Solution

  • I can't test it but I would use the same variables messages, token without 1,2,3 and results I would add to the same list with all messages. And I would run it in some loop.

    Something like this

    all_messages = []
    
    token = ''
    
    while True:
        messages = service.users().messages().list(userId=user_id, pageToken=token).execute().get('messages', [])
        token = service.users().messages().list(userId=user_id, pageToken=token).execute().get('nextPageToken', [])
        print(messages, token)
    
        if not messages:
            break
        
        #all_messages.extend(messages)  # `extend` or `+=`, not `append`
        all_messages += messages        # `extend` or `+=`, not `append`
        
    

    I only don't know how API informs that there is no more messages - maybe it returns empty list or maybe it gives empty token, or maybe it raise error.


    EDIT:

    Information for other users: as @emmalynnh mentioned in comment

    When there are no more messages it gives an empty token 
    and the API will return a 400 error if you try to request.