python google-sheets google-sheets-api gspread

gspread - get specific columns by column name

I have a sheet with the struct below.

COLUMN1 | COLUMN2 | COLUMN3
0       | 3       | 1
1       | 3       | 
2       | 5       | 4
4       | 2       | 2

How can I return just the column 1 and 2 by its column name? Not by the index. Using the wks.get_all_values() as shown bellow, it will return all columns of the sheet.

My code:

ss = gs.open(args['name'])
wks = ss.worksheet(args['worksheet'])
data = wks.get_all_values()
headers = data.pop(0)
df = pd.Dataframe(data, columns=headers)

I was needing something like, wks.get_columns(['COLUMN1','COLUMN2'])

I hope someone can help me. Thanks

Solution

I believe your goal is as follows.

You want to retrieve the columns by the header values.
You want to achieve this using gspread for python.

In this case, how about the following modification?

Modified script:

In this modification, the specific columns are retrieved from the values retrieved with wks.get_all_values(). In this case, one API call is used. For example, when the specific columns are retrieved after the header row was retrieved, 2 API calls are retrieved. So I proposed this method.

headers = ["COLUMN1", "COLUMN3"] # Please set the header values you want to retrieve.

ss = gs.open(args['name'])
wks = ss.worksheet(args['worksheet'])
values = wks.get_all_values()
data = zip(*(e for e in zip(*values) if e[0] in headers))
df = pd.DataFrame(data, columns=headers)

When this script is run using your sample sheet in your question, the following result is obtained.

  0  COLUMN1  COLUMN3
  1        0        1
  2        1
  3        2        4
  4        4        2

When you want to retrieve data as a list, please modify data = zip(*(e for e in zip(*values) if e[0] in headers)) to data = [list(f) for f in zip(*(e for e in zip(*values) if e[0] in headers))].