Search code examples
python-2.7google-admin-sdkgoogle-directory-api

python code for directory api to batch retrieve all users from domain


Currently I have a method that retrieves all ~119,000 gmail accounts and writes them to a csv file using python code below and the enabled admin.sdk + auth 2.0:

def get_accounts(self):
    students = []
    page_token = None
    params = {'customer': 'my_customer'}

    while True:
        try:
            if page_token:
                params['pageToken'] = page_token
            current_page = self.dir_api.users().list(**params).execute()

            students.extend(current_page['users'])

            # write each page of data to a file
            csv_file = CSVWriter(students, self.output_file)
            csv_file.write_file()

            # clear the list for the next page of data
            del students[:]

            page_token = current_page.get('nextPageToken')

            if not page_token:
                break

        except errors.HttpError as error:
            break

I would like to retrieve all 119,000 as a lump sum, that is, without having to loop or as a batch call. Is this possible and if so, can you provide example python code? I have run into communication issues and have to rerun the process multiple times to obtain the ~119,000 accts successfully (takes about 10 minutes to download). Would like to minimize communication errors. Please advise if better method exists or non-looping method also is possible.


Solution

  • There's no way to do this as a batch because you need to know each pageToken and those are only given as the page is retrieved. However, you can increase your performance somewhat by getting larger pages:

    params = {'customer': 'my_customer', 'maxResults': 500}
    

    since the default page size when maxResults is not set is 100, adding maxResults: 500 will reduce the number of API calls by an order of 5. While each call may take slightly longer, you should notice performance increases because you're making far fewer API calls and HTTP round trips.

    You should also look at using the fields parameter to only specify user attributes you need to read in the list. That way you're not wasting time and bandwidth retrieving details about your users that your app never uses. Try something like:

    my_fields = 'nextPageToken,users(primaryEmail,name,suspended)'
    params = {
      'customer': 'my_customer',
       maxResults': 500,
       fields: my_fields
       }
    

    Last of all, if your app retrieves the list of users fairly frequently, turning on caching may help.