Search code examples
pythonpandasnumpygspread

Is there a way to create a DataFrame from specific colour coded rows that are in within a Google Sheet? (gspread and pandas)


I have a Google Sheet with large amounts of rows that are colour coded. I was looking to create a new DataFrame based on coloured rows. Is it possible to select rows based on their colour? Or even if there would is a way select rows that are not color coded.

Can't really find anything on this, so really not sure if this is possible.


Solution

  • I don’t know how to do it with gspread (most likely it’s impossible), but it’s easy with google-api-python-client (which is a dependency of gspread)

    You need to pass the includeGridData parameter to spreadsheets().get() method. Here is a slightly modified example from the doc:

    data = (
        service.spreadsheets()
        .get(
            spreadsheetId=spreadsheet_id, 
            ranges=ranges, 
            includeGridData=True  # important,
            fields=",".join([  # specify only required fields to reduce response size
                "sheets.data.rowData.values.formattedValue",
                "sheets.data.rowData.values.effectiveFormat.backgroundColor",
            ])
        )
        .execute()
    )
    
    # now you should parse returned JSON according your needs, e.g.:
    
    def parse(data): # data type: https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets#Spreadsheet
        white_color = {"red": 1, "green": 1, "blue": 1}
    
        for grid_data in data["sheets"][0]["data"]:
            # grid_data type: https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/sheets#GridData
            for row_data in grid_data["rowData"]:
                # row_data["values"] contains a list of cells (CellData), one per column
                # cell_data type: https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/cells#CellData
                row = []
                for cell_data in row_data["values"]:
                    value = cell_data["formattedValue"] # or cell_data["userEnteredValue"]
                    color = cell_data["effectiveFormat"]["backgroundColor"]
                    if color != white_color:
                        row.append(value)
                    else:
                        row.append(None)
                yield row
    
    pd.DataFrame(list(parse(data)))
    

    Upd: read multiple columns