Search code examples
pythonpandasdataframecsvdrive

Read CSV file from Google Drive or any cloud service with Python Pandas


I am trying to read a CSV file from my private Google Drive. The file has as authorisation: Anyone with the link. Here is the link: https://drive.google.com/file/d/12txcYHcO8aiwO9f948_nsaIE3wBGAuJa/view?usp=sharing

and here is a sample of the file:

email   first_name  last_name
        
uno@gmail.com   Luca    Rossi
due@gmail.com   Daniel  Bianchi
tre@gmail.com   Gabriel Domeneghetti
qua@gmail.com   Christian   Bona
cin@gmail.com   Simone  Marsango

I need to read this file in order to parse this data into a program. I tried many ways, such as every possibility that has been suggested in this question: Pandas: How to read CSV file from google drive public?. This is the code I wrote to do that so far:

csv_file_url = 'the file URL as copied in the drive UI'

file_id = csv_file_url.split('/')[-2]
dwn_url = 'https://drive.google.com/uc?export=download&id=' + file_id
url2 = requests.get(dwn_url).text
csv_raw = StringIO(url2)
df = pd.read_csv(csv_raw)
print(df.head())

And that should work, but returns only this table:

   ÿþe  Unnamed: 1  Unnamed: 2
0  NaN         NaN         NaN
1  NaN         NaN         NaN
2  NaN         NaN         NaN
3  NaN         NaN         NaN
4  NaN         NaN         NaN

I think it is only a format matter, but I don't know how to get rid of it. Please, if you know how, help me.


Solution

  • You data is UTF16 encoded. You can read it specifying the encoding:

    pd.read_csv(dwn_url, encoding='utf16')
    

    Result:

               email first_name     last_name
    0            NaN        NaN           NaN
    1  uno@gmail.com       Luca         Rossi
    2  due@gmail.com     Daniel       Bianchi
    3  tre@gmail.com    Gabriel  Domeneghetti
    4  qua@gmail.com  Christian          Bona
    5  cin@gmail.com     Simone      Marsango
    

    (read_csv can directly read from a url, no need for requests and StringIO.)