Search code examples
pythongoogle-sheetsgoogle-sheets-apigspread

gspread - get specific columns by column name


I have a sheet with the struct below.

COLUMN1 | COLUMN2 | COLUMN3
0       | 3       | 1
1       | 3       | 
2       | 5       | 4
4       | 2       | 2

How can I return just the column 1 and 2 by its column name? Not by the index. Using the wks.get_all_values() as shown bellow, it will return all columns of the sheet.

My code:

ss = gs.open(args['name'])
wks = ss.worksheet(args['worksheet'])
data = wks.get_all_values()
headers = data.pop(0)
df = pd.Dataframe(data, columns=headers)

I was needing something like, wks.get_columns(['COLUMN1','COLUMN2'])

I hope someone can help me. Thanks


Solution

  • I believe your goal is as follows.

    • You want to retrieve the columns by the header values.
    • You want to achieve this using gspread for python.

    In this case, how about the following modification?

    Modified script:

    In this modification, the specific columns are retrieved from the values retrieved with wks.get_all_values(). In this case, one API call is used. For example, when the specific columns are retrieved after the header row was retrieved, 2 API calls are retrieved. So I proposed this method.

    headers = ["COLUMN1", "COLUMN3"] # Please set the header values you want to retrieve.
    
    ss = gs.open(args['name'])
    wks = ss.worksheet(args['worksheet'])
    values = wks.get_all_values()
    data = zip(*(e for e in zip(*values) if e[0] in headers))
    df = pd.DataFrame(data, columns=headers)
    
    • When this script is run using your sample sheet in your question, the following result is obtained.

        0  COLUMN1  COLUMN3
        1        0        1
        2        1
        3        2        4
        4        4        2
      
    • When you want to retrieve data as a list, please modify data = zip(*(e for e in zip(*values) if e[0] in headers)) to data = [list(f) for f in zip(*(e for e in zip(*values) if e[0] in headers))].