Search code examples
pythonpandasgoogle-sheetsgoogle-drive-apigoogle-apps

Getting Google Spreadsheet CSV into A Pandas Dataframe


I uploaded a file to Google spreadsheets (to make a publically accessible example IPython Notebook, with data) I was using the file in it's native form could be read into a Pandas Dataframe. So now I use the following code to read the spreadsheet, works fine but just comes in as string,, and I'm not having any luck trying to get it back into a dataframe (you can get the data)

import requests
r = requests.get('https://docs.google.com/spreadsheet/ccc?key=0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc&output=csv')
data = r.content

The data ends up looking like: (1st row headers)

',City,region,Res_Comm,mkt_type,Quradate,National_exp,Alabama_exp,Sales_exp,Inventory_exp,Price_exp,Credit_exp\n0,Dothan,South_Central-Montgomery-Auburn-Wiregrass-Dothan,Residential,Rural,1/15/2010,2,2,3,2,3,3\n10,Foley,South_Mobile-Baldwin,Residential,Suburban_Urban,1/15/2010,4,4,4,4,4,3\n12,Birmingham,North_Central-Birmingham-Tuscaloosa-Anniston,Commercial,Suburban_Urban,1/15/2010,2,2,3,2,2,3\n

The native pandas code that brings in the disk resident file looks like:

df = pd.io.parsers.read_csv('/home/tom/Dropbox/Projects/annonallanswerswithmaster1012013.csv',index_col=0,parse_dates=['Quradate'])

A "clean" solution would be helpful to many to provide an easy way to share datasets for Pandas use! I tried a bunch of alternative with no success and I'm pretty sure I'm missing something obvious again.

Just a Update note The new Google spreadsheet has a different URL pattern Just use this in place of the URL in the above example and or the below answer and you should be fine here is an example:

https://docs.google.com/spreadsheets/d/177_dFZ0i-duGxLiyg6tnwNDKruAYE-_Dd8vAQziipJQ/export?format=csv&id

see solution below from @Max Ghenis which just used pd.read_csv, no need for StringIO or requests...


Solution

  • You can use read_csv() on a StringIO object:

    from io import BytesIO
    
    import requests
    import pandas as pd
    
    r = requests.get('https://docs.google.com/spreadsheet/ccc?key=0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc&output=csv')
    data = r.content
        
    In [10]: df = pd.read_csv(BytesIO(data), index_col=0,parse_dates=['Quradate'])
    
    In [11]: df.head()
    Out[11]: 
              City                                            region     Res_Comm  \
    0       Dothan  South_Central-Montgomery-Auburn-Wiregrass-Dothan  Residential   
    10       Foley                              South_Mobile-Baldwin  Residential   
    12  Birmingham      North_Central-Birmingham-Tuscaloosa-Anniston   Commercial   
    38       Brent      North_Central-Birmingham-Tuscaloosa-Anniston  Residential   
    44      Athens                 North_Huntsville-Decatur-Florence  Residential   
    
              mkt_type            Quradate  National_exp  Alabama_exp  Sales_exp  \
    0            Rural 2010-01-15 00:00:00             2            2          3   
    10  Suburban_Urban 2010-01-15 00:00:00             4            4          4   
    12  Suburban_Urban 2010-01-15 00:00:00             2            2          3   
    38           Rural 2010-01-15 00:00:00             3            3          3   
    44  Suburban_Urban 2010-01-15 00:00:00             4            5          4   
    
        Inventory_exp  Price_exp  Credit_exp  
    0               2          3           3  
    10              4          4           3  
    12              2          2           3  
    38              3          3           2  
    44              4          4           4