Search code examples
pythongoogle-sheetsamazon-redshiftgoogle-api-python-clientgoogle-python-api

Getting Google Sheets Data into Redshift


I'm trying to get data that lives within a Google Sheet into our Redshift database. I was able to follow the directions from this link: https://www.twilio.com/blog/2017/02/an-easy-way-to-read-and-write-to-a-google-spreadsheet-in-python.html

Is it possible to have it pull data from the most recently added google sheets within a folder (instead of just specifying a single sheet) and write to the Redshift table?

Here is what was used to read the google sheets data into Python:

import gspread
from oauth2client.service_account import ServiceAccountCredentials


# use creds to create a client to interact with the Google Drive API
scope = ['https://spreadsheets.google.com/feeds']
creds = ServiceAccountCredentials.from_json_keyfile_name('client_secret.json', scope)
client = gspread.authorize(creds)

# Find a workbook by name and open the first sheet
# Make sure you use the right name here.
sheet = client.open("Copy of Legislators 2017").sheet1

# Extract and print all of the values
list_of_hashes = sheet.get_all_records()
print(list_of_hashes)

Solution

  • You can use the Drive API to query for files added within a given timeframe that are of a specific type. All the search parameters and syntax for such a query are listed here.

    # Build the Drive service
    ...
    
    # Query for recent files, with stipulation that their mimetype contains "spreadsheet"
    query = "mimeType contains 'spreadsheet' and modifiedTime > '"
    query += someDateAsUTC_inRFC_3339_String + "'"
    
    # Execute the query
    request = drive.files.list(q=query, .... )
    resp = request.execute()
    nextPage = resp['nextPageToken']
    if resp['files']:
        # Call method to consume files
    while nextPage:
        request = drive.files.list_next(request, resp)
        if request:
            resp = request.execute()
            nextPage = resp['nextPageToken']
            if resp['files']:
                # Call method to consume files
        else
            break
    # Done