Search code examples
pythonpandasdataframegspread

Get all records from 2 columns, starting from specific row


Set-up

Via gspread I have access to a Google sheet containing data.

Normally, I use df = pd.DataFrame(wsheet.get_all_records()) to dump all data into a pandas dataframe.


Issue

I only need the data of 5 specific sequential columns, i.e. all columns including and between for example column 1 and 5 of the Google sheet.

Moreover, I only need the data starting from the 5th row in the Google sheet.

I've tried my regular df = pd.DataFrame(wsheet.get_all_records()) and then drop columns and rows in pandas. However, I think due to the markup I use in the first 4 rows in the Google sheet, the resulting dataframe has some oddities – adjusting in pandas gives strange results.


Question

Given the markup, I suspect it's easier to just dump all data incl. and between column 1 and 5 in a dataframe, starting from row 5.

But how do I do this?


Solution

    • You want to retrieve the values from the columns "A" and "E" after the row 5 from the Google Spreadsheet.
    • You want to achieve this using gspread with python.
    • You have already been able to get and put values for Spreadsheet using Sheets API.

    Modification points:

    • In this modification, at first, the values are retrieved with get_all_values() as a slice. And the retrieved values are processed and convert it to the dataframe.

    Modified script:

    When your script is modified, it becomes as follows. In this case, it supposes that wsheet can be used.

    From:

    df = pd.DataFrame(wsheet.get_all_records())
    

    To:

    v = [[e[0], e[4]] for e in wsheet.get_all_values()]
    df = pd.DataFrame(v[4:], columns=v[0])
    
    • In this case, df is the values retrieved the columns "A" and "E" after the row 5.

    Reference:

    Added:

    If you want to retrieve the values from the columns "A" to "E" after the row 5 from the Google Spreadsheet, how about the following modification?

    From:

    df = pd.DataFrame(wsheet.get_all_records())
    

    To:

    v = [e[0:5] for e in wsheet.get_all_values()]
    df = pd.DataFrame(v[4:], columns=v[0])