Search code examples
pythongoogle-sheetsgoogle-sheets-apigspread

What is the most resource efficient way to duplicate and then update multiple worksheets in a single spreadsheet using gspread with python?


I am writing a python script that (along with some other things) duplicates a template spreadsheet on google sheets, duplicates a tab (worksheet) within that spreadsheet up to 25 times, and then updates those duplicated tabs with some information supplied by the script. The way I have accomplished this so far is by doing the following:

copied_spreadsheet = 'my new spreadsheet'

client.copy('(spreadsheet key))', title=copied_spreadsheet, copy_permissions=True)
copied_spreadsheet = client.open(spreadsheet_name)

To duplicate the template spreadsheet, then:

for idx, tab_name in enumerate(list_of_tabs_to_add):
     copied_spreadsheet.worksheet('template_tab').duplicate(insert_sheet_index=idx + 1, new_sheet_name=tab_name)
     supplied_information = ['stuff to add to the duplicated tab']
     copied_spreadsheet.worksheet(tab_name).update('A1:F3', supplied_information, value_input_option='USER_ENTERED')

This worked pretty well when I had to duplicate the template tab around 20 times, but when trying to do it 25 times, I've found myself occasionally running into a resource error where I exceed the quota for read requests per minute per user. The error looks something like this:

gspread.exceptions.APIError: {'code': 429, 'message': "Quota exceeded for quota metric 'Read requests' and limit 'Read requests per minute per user' of service 'sheets.googleapis.com' for consumer (etc...)

I've tried really hacky solutions like adding a time.sleep() command in between loops to try and act as a sort of buffer so I don't hit the quota as quickly, and that helps a bit, but the results can be a little hit and miss.

Apart from requesting a higher quota, is there anything I can do to make my code more efficient? I believe I am working with the default quotas, and I don't think what I'm doing is all that intensive, so my feeling is that I am just not coding this solution as efficiently as I could.

In my head I feel like the place I can get the most efficiency out of is the loop that duplicates and updates 25 times. Is there any way I can store the template worksheet into a variable, duplicate and modify that variable then upload the 25 copies with a single command instead of 25 times? I'm not very familiar with dataframes or the gspread_dataframes module, but does a potential solution exist with that library too, or is the way I'm currently going about it the best way to do it, necessitating a simple quota increase?


Solution

  • I believe your goal as follows.

    • You want to copy a sheet in Google Spreadsheet 25 times, and want to put the values of supplied_information to each copied sheet.
      • In this case, you want to reduce the process cost of the script.
    • You want to achieve this using gspread of python.
    • You have already been able to get and put values for Spreadsheet using Sheets API.

    In this case, I would like to propose the following flow.

    1. Retrieve the sheet ID of template_tab in the copied Google Spreadsheet.
    2. Create 2 requests for "batchUpdate" and "values.batchUpdate" for Sheets API.
    3. Request batchUpdate method using gspread.
    4. Request values.batchUpdate method using requests module.
      • Because, in gspread, in the current stage, the values.batchUpdate method is not included. So, I used this using the requests module by retrieving the access token from the authorization script of gspread.

    In this flow, 3 quotas of Sheets API are used. When this flow is reflected to your script, it becomes as follows. By the way, it seems that client.copy() returns a Spreadsheet instance.

    Modified script:

    Please set '(spreadsheet key))', list_of_tabs_to_add and supplied_information.

    client = gspread.authorize(credentials) # Please use your "credentials" here.
    
    copied_spreadsheet = 'my new spreadsheet'
    copied_spreadsheet = client.copy('(spreadsheet key))', title=copied_spreadsheet, copy_permissions=True)
    
    list_of_tabs_to_add = [###] # <--- Please set value here.
    
    # 1. Retrieve the sheet ID of `template_tab` in the copied Google Spreadsheet.
    srcSheetId = copied_spreadsheet.worksheet('template_tab').id
    
    # 2. Create 2 requests for "batchUpdate" and "values.batchUpdate" for Sheets API.
    requests1 = []
    requests2 = []
    for idx, tab_name in enumerate(list_of_tabs_to_add):
        temp = idx + 1
        requests1.append({"duplicateSheet": {"sourceSheetId": srcSheetId, "insertSheetIndex": temp, "newSheetName": tab_name}})
    
        supplied_information = [###] # <--- Please set value here.
    
        requests2.append({"range": tab_name, "values": supplied_information})
    
    # 3. Request batchUpdate method using gspread.
    res1 = copied_spreadsheet.batch_update({"requests": requests1})
    
    # 4. Request values.batchUpdate method using requests module.
    res2 = requests.post(
        'https://sheets.googleapis.com/v4/spreadsheets/' + copied_spreadsheet.id + '/values:batchUpdate',
        headers={"Authorization": "Bearer " + credentials.access_token, "Content-Type": "application/json"},
        data=json.dumps({"data": requests2, "valueInputOption": "USER_ENTERED"}),
    )
    
    • In this script, requests and json libraries are required to be included.

    References: