Search code examples
python-3.xgoogle-sheetsopenpyxlgoogle-api-python-client

Extracting Comments from a Cell in Google Sheets


I am using gspread to interact with my Google spreadsheets and pandas to manipulate that data. I have set up a Google API account and have done all the authentication following this article. I am able to retrieve the data except for the comments from cells. In my Google Sheets dataset, I have certain columns and there are some comments that exist on certain cells. My end target is to convert the spreadsheet data into an array of objects where each object represents a single row record and if a comment exists in any of the cells then I need to extract that as well.

I have tried pygsheets but it's not supporting the comment feature and also not able to find anything in gpsread to extract comments.

I have tested openpyxl by giving the local path of my Google Work Sheet and it does extract comments but I need to figure out how it can be along with Google Sheet API.

Sample Code:

    with open("automate-0738852.json", 'r') as j:
         service_account_info = json.loads(j.read())

    credentials = service_account.Credentials.from_service_account_info(service_account_info)
    scope = ['https://spreadsheets.google.com/feeds','https://www.googleapis.com/auth/drive']
    creds_with_scope = credentials.with_scopes(scope)
    
    client = gspread.authorize(creds_with_scope)
    gc_client = pygsheets.authorize(custom_credentials=creds_with_scope)
    spreadsheet = client.open_by_url('https://docs.google.com/spreadsheets/d/spreadsheetID/edit?usp=sharing')
    worksheet = spreadsheet.get_worksheet(0)
    records_data = worksheet.get_all_records(head=6)
    records_df = pd.DataFrame.from_dict(records_data)

    # Removing the empty title header
    records_df.columns = records_df.columns.astype(str).str.strip()
    records_df.drop(columns=[''], errors='ignore', inplace=True)

Solution

  • About your following question,

    I have tested openpyxl by giving the local path of my Google Work Sheet and it does extract comments but I need to figure out how it can be along with Google Sheet API.

    Issue and workaround:

    In the current stage, a note and a comment can be added to a cell in Google Spreadsheet. From your question, I suppose that you are saying about the comment of a cell instead of the note.

    If my understanding is correct, the comments cannot be retrieved by Sheets API. They can be retrieved with Drive API. Ref But, unfortunately, in the current stage, although the comments can be retrieved by Drive API, the cell coordinates of the retrieved comments cannot be known because the range ID cannot correspond to the cell coordinate.

    From this situation, as the current workaround, when you want to retrieve the comments and other values, and metadata from Google Spreadsheet using Python, how about the following flow?

    1. Retrieve the values and metadata from Google Spreadsheet using "Method: spreadsheets.get" of Sheets API. If you want to retrieve only values, I think that "Method: spreadsheets.values.get" can be also used.

      • In the current stage, gspread has no method for using "Method: spreadsheets.get". But, the sample script can be seen at this thread.
    2. In order to retrieve the comments from Google Spreadsheet, export Google Spreadsheet as XLSX data, and retrieve the comments from XLSX data using openpyxl.

      • The sample script for exporting Google Spreadsheet as XLSX using the authorization of gspread can be seen at this thread.
    3. Merge both values.

    By this flow, I guess that your goal can be achieved.

    References: