Search code examples
pythongoogle-sheetsformula

SUM data in google sheets based on name in first column


I have 24 sheets in a spreadsheet in Google sheets that look similar like this:

example

I have a total of 125 rows in 24 sheets and I need to make a "total" sheet where I sum the data from all sheets for each category.

The problem is that in each sheet the data is sorted differently and in each sheet only some categories are listed, not all.

I can't come up with an efficient solution to sum the data quickly, so I'm turning here. Has anyone solved this, or does anyone have an idea for an effective solution? Thank you very much in advance.

I searched on google, tried different solutions like SUM, SUMIF but everything would take a very long time or the solution didn't work.

I would like to find an efficient formula, at worst some idea how to solve it via export and python. Thank you all in advance


Solution

  • This might help you get in the right direction to get a simple but efficient formula:

    =QUERY({Sheet1!A:B; Sheet2!A:B; Sheet3!A:B; Sheet4!A:B},"Select Col1, sum(Col2) where Col1 is not Null AND Col2 is not Null group by Col1 label sum(Col2) 'Total'")
    

    Explanation:

    • Using ={Sheet1!A:B; Sheet2!A:B; Sheet3!A:B; Sheet4!A:B} separately gets all the values within those ranges under the same Array, you would need to make sure that all of those ranges have the same amount of columns.
    • Now that all these values are within the same array we could use QUERY to get values, this is how I tested it:

    Set Up:

    enter image description here

    End result:

    enter image description here


    The hard part would be to actually get all the Sheet names and add them to the formula, Based on the Python Quickstart from Google Workspace I created the following code that would print a String with all the sheet names concatenated with the appropriate range and you would only need to do it once:

    SAMPLE_SPREADSHEET_ID = 'A_GOOGLE_SPREADSHEET_ID'
    SAMPLE_RANGE_NAME = 'Sheet1!A2:E'
    rangeWithinSheet = '!A:B; '
    
    
    
    def main():
        
        try:
            service = build('sheets', 'v4', credentials=creds)
    
            # Call the Sheets API
            sheet = service.spreadsheets()
            request = service.spreadsheets().get(spreadsheetId=SAMPLE_SPREADSHEET_ID)
            spreadsheet = request.execute()
            sheets = spreadsheet.get("sheets", [])
            stringWithRanges = ''
            
            for sheet in sheets:
                stringWithRanges = stringWithRanges + sheet.get("properties").get("title") + rangeWithinSheet
                print(stringWithRanges)
    
        except HttpError as err:
            print(err)
    
    
    if __name__ == '__main__':
        main()
    

    Output in my case:

    Sheet1!A:B; Sheet2!A:B; Sheet3!A:B; Sheet4!A:B; Formula!A:B; 
    

    Update: I added a where condition to the Query as @Natsu recommended, now it only shows prints valid results.