Search code examples
pythonpandascsvgoogle-sheets

How can I get the name of a spreadsheet from only its URL using pandas


I just need the name of the file so I when I create a CSV file out of the imported sheet I can give it a descriptive name. But I don't know how to get its name using the pd.read_excel() function or anyother function.

The following code is where I am at

import pandas as pd

link = input("Enter link here:")

link=link.split('/')
sheet_id=link[5]
print (sheet_id)

  #sets spreadsheet to pd
xls = pd.ExcelFile(f"https://docs.google.com/spreadsheets/d/{sheet_id}/export?format=xlsx") 

  #sets sheet_1 to be spreasheet
sheet_1 = pd.read_excel(xls,"Summary", header = 0)
print(sheet_1)
print("------------")

Solution

  • I believe your goal is as follows.

    • You want to retrieve the title of a Google Spreadsheet of f"https://docs.google.com/spreadsheets/d/{sheet_id}/export?format=xlsx".
    • In your situation, the Spreadsheet f"https://docs.google.com/spreadsheets/d/{sheet_id}/export?format=xlsx" has already been publicly shared.

    If my understanding is correct, how about the following sample script?

    Unfortunately, I couldn't find the title of the Google spreadsheet from ExcelFile object. So, in this answer, I would like to propose another approach.

    Sample script:

    In this sample, the filename of Google Spreadsheet is retrieved from the response header.

    import pandas as pd
    import requests
    
    link = input("Enter link here:")
    
    link=link.split('/')
    sheet_id=link[5]
    print (sheet_id)
    
    r = requests.get(f"https://docs.google.com/spreadsheets/d/{sheet_id}/export?format=xlsx")
    print(r.headers['content-disposition'])
    filename = r.headers['content-disposition'].split("filename*=UTF-8''")[1].replace(".xlsx", "")
    print(filename)
    

    Or, when unquote is used, it becomes as follows.

    import pandas as pd
    import requests
    from urllib.parse import unquote
    
    link = input("Enter link here:")
    
    link=link.split('/')
    sheet_id=link[5]
    print (sheet_id)
    
    r = requests.get(f"https://docs.google.com/spreadsheets/d/{sheet_id}/export?format=xlsx")
    print(r.headers['content-disposition'])
    filename = unquote(r.headers['content-disposition'].split("filename*=UTF-8''")[1].replace(".xlsx", ""))
    print(filename)
    

    Note:

    • In this case, it supposes that your Google Spreadsheet has already been publicly shared. Please be careful about this.

    • By the way, if you can use your API key for using Drive API, the following script can be also used.

      api_key = "###" # Please set your API key.
      spreadsheet_id = "###" # Please set your Spreadsheet ID.
      r = requests.get(f"https://www.googleapis.com/drive/v3/files/{spreadsheet_id}?key={api_key}")
      filename = r.json().get("name", "")
      print(filename)