Search code examples
pythonapigoogle-sheetsjupyter-notebookxlsx

Converting xlsx to Google Sheets in Jupyter


I'm trying to open a xlsx file from Google Drive as a Google Sheets file in Jupyter.

from googleapiclient.discovery import build
from google.oauth2 import service_account

SERVICE_ACCOUNT_FILE = 'gs_credentials.json'
SCOPES = ['https://www.googleapis.com/auth/spreadsheets']

creds = None
creds = service_account.Credentials.from_service_account_file(
        SERVICE_ACCOUNT_FILE, scopes=SCOPES)

SAMPLE_SPREADSHEET_ID = 'someidhere'
RANGE = 'somerangehere'

service = build('sheets', 'v4', credentials=creds)

sheet = service.spreadsheets()
result = sheet.values().get(spreadsheetId=SAMPLE_SPREADSHEET_ID,
                            range=RANGE).execute()
values = result.get('values', [])

this script work for reading data from a Google Sheet, but i have one that's saved as xlsx. Is there a way to convert it? I found a couple answers that went something like:

service = build('drive', 'v3', credentials=creds)
service.files().copy(fileId=ID_OF_THE_EXCEL_FILE, body={"mimeType"="application/vnd.google-apps.spreadsheet"}).execute()

from How do I solve a Google Sheet API error in Python

but this doesn't seem work for me. (i think the syntax is broken... tried to fix it, but couldn't figure it out)


Solution

  • Syntax error:

    You are in the right direction on the converting xlsx to spreadsheet.Try changing the = to : in mimeType.

    Correct syntax should be body={"mimeType":"application/vnd.google-apps.spreadsheet"}

    After converting the xlsx file to google spreadsheet you can then use the spreadsheet service to obtain the values on the file. Please see sample code below.

    Your Code should look like this:

    from googleapiclient.discovery import build
    from google.oauth2 import service_account
    import pandas as pd
    
    SERVICE_ACCOUNT_FILE = 'gs_credentials.json'
    SCOPES = ['https://www.googleapis.com/auth/spreadsheets']
    
    creds = None
    creds = service_account.Credentials.from_service_account_file(
            SERVICE_ACCOUNT_FILE, scopes=SCOPES)
    
    SAMPLE_SPREADSHEET_ID = 'someidhere'
    RANGE = 'somerangehere'
    
    # Build the services
    service_drive = build('drive', 'v3', credentials=creds)
    service_sheets = build('sheets', 'v4', credentials=creds)
    
    
    # Converting xlsx file to google spreadsheet
    
    xlsx_file = service_drive.files().copy(fileId=SAMPLE_SPREADSHEET_ID, body= 
    {"mimeType":"application/vnd.google-apps.spreadsheet"}).execute()
    
    # extracting the new converted spreadsheet Id
    new_id = xlsx_file.get('id', [])
    
    # Getting the values using the spreadsheet service
    result =   service_sheets.spreadsheets().values().get(spreadsheetId=new_id,range = RANGE).execute()
    values = result.get('values', [])
    
    #converting values to dataframe using pandas
    df = pd.DataFrame(values)
    df
    

    My sample Code on Google Colaboratory:

    from googleapiclient.discovery import build
    from google.colab import auth
    from google.auth import default
    import pandas as pd
    
    auth.authenticate_user()
    creds, _ = default()
    
    # Edits starts here from your existing code
    # Build the services
    service_drive = build('drive', 'v3', credentials=creds)
    service_sheets = build('sheets', 'v4', credentials=creds)
    
    # xlsx file Id
    file_id = 'Id here'
    range = "Range here"
    
    # Converting xlsx file to google spreadsheet
    
    xlsx_file = service_drive.files().copy(fileId=file_id, body={"mimeType":"application/vnd.google-apps.spreadsheet"}).execute()
    
    # extracting the new converted spreadsheet Id
    new_id = xlsx_file.get('id', [])
    
    # Getting the values using the spreadsheet service
    result = service_sheets.spreadsheets().values().get(spreadsheetId=new_id,range = range).execute()
    values = result.get('values', [])
    
    #converting values to dataframe using pandas
    df = pd.DataFrame(values)
    df
    

    Note: Authentication may differ from you since I am running this code on google colab

    Output:

    enter image description here

    References:

    https://developers.google.com/drive/api/v3/reference/files/copy https://developers.google.com/sheets/api/guides/values