Search code examples
pythongoogle-sheetsgoogle-apigoogle-drive-api

Google API - Extracted comments from a google spreadsheet using Python number only 20?


I have a google spreadsheet with around 3000 rows and I am trying to extract comments from this spreadsheet using the following code:

import requests
from apiclient import errors
from apiclient import discovery
from apiclient.discovery import build
from oauth2client.client import OAuth2WebServerFlow
import httplib2

CLIENT_ID = "xxxxxyyyy"
CLIENT_SECRET = "xxxxxxx"
OAUTH_SCOPE = "https://www.googleapis.com/auth/drive"
REDIRECT_URI = 'urn:ietf:wg:oauth:2.0:oob'
file-id = "zzzzzz"
def retrieve_comments(service, file_id):
  """Retrieve a list of comments.

  Args:
    service: Drive API service instance.
    file_id: ID of the file to retrieve comments for.
  Returns:
    List of comments.
  """
  try:
    comments = service.comments().list(fileId=file_id).execute()
    return comments.get('items', [])
  except errors.HttpError as error:
    print(f'An error occurred: {error}')
  return None
# ...

flow = OAuth2WebServerFlow(CLIENT_ID,CLIENT_SECRET,OAUTH_SCOPE)
flow.redirect_uri = REDIRECT_URI
authorize_url = flow.step1_get_authorize_url()
print("Go to the following link in your web browser "+ authorize_url)
code = input("Enter verfication code : ").strip()
credentials = flow.step2_exchange(code)

http = httplib2.Http()
http = credentials.authorize(http)

service = build('drive', 'v2', http=http)

comments = retrieve_comments(service, file-id)

However, the length of the list comments is only 20 whereas the spreadsheet surely contains more comments. Could someone explain which parameter I would need to tweak to retrieve all the comments in the spreadsheet? Thanks!


Solution

  • In the current stage, the default value of maxResults (Drive API v2) or pageSize (Drive API v3) of "Comments: list" of Drive API v3 is 20. I thought that this might be the reason for your current issue of However, the length of the list comments is only 20 whereas the spreadsheet surely contains more comments.. In this case, how about the following modification?

    From:

    comments = service.comments().list(fileId=file_id).execute()
    return comments.get('items', [])
    

    To:

    From your script, when you want to use Drive API v2, please modify it as follows.

    file_id = "###" # Please set your file ID.
    
    res = []
    page_token = None
    while True:
        obj = service.comments().list(fileId=file_id, pageToken=page_token, maxResults=100, fields="*").execute()
        if len(obj.get("items", [])) > 0:
            res = [*res, *obj.get("items", [])]
        page_token = obj.get("nextPageToken")
        if not page_token:
            break
    return res
    

    When you want to use Drive API v3, please modify it as follows.

    file_id = "###" # Please set your file ID.
    
    res = []
    page_token = None
    while True:
        obj = service.comments().list(fileId=file_id, pageToken=page_token, pageSize=100, fields="*").execute()
        if len(obj.get("comments", [])) > 0:
            res = [*res, *obj.get("comments", [])]
        page_token = obj.get("nextPageToken")
        if not page_token:
            break
    return res
    
    • In this modification, the all comments in the Spreadsheet is returned as an array.

    References: