What is the equivalent of header=0
in pandas
, which recognises the first line as a heading in gspread
?
pandas import statement (correct)
import pandas as pd
# gcp / google sheets URL
df_URL = "https://docs.google.com/spreadsheets/d/1wKtvNfWSjPNC1fNmTfUHm7sXiaPyOZMchjzQBt1y_f8/edit?usp=sharing"
raw_dataset = pd.read_csv(df_URL, na_values='?',sep=';'
, skipinitialspace=True, header=0, index_col=None)
Using the gspread function, so far I import the data, change the first line to the heading then delete the first line after but this recognises everything in the DataFrame as a string. I would like to recognise the first line as a heading right away in the import statement.
gspread import statement that needs header=True equivalent
import pandas as pd
from google.colab import auth
auth.authenticate_user()
import gspread
from oauth2client.client import GoogleCredentials
# gcp / google sheets url
df_URL = "https://docs.google.com/spreadsheets/d/1wKtvNfWSjPNC1fNmTfUHm7sXiaPyOZMchjzQBt1y_f8/edit?usp=sharing"
# importing the data from Google Drive setup
gc = gspread.authorize(GoogleCredentials.get_application_default())
# read data and put it in dataframe
g_sheets = gc.open_by_url(df_URL)
df = pd.DataFrame(g_sheets.get_worksheet(0).get_all_values())
# change first row to header
df = df.rename(columns=df.iloc[0])
# drop first row
df.drop(index=df.index[0], axis=0, inplace=True)
Looking at the API documentation, you probably want to use:
df = pd.DataFrame(g_sheets.get_worksheet(0).get_all_records(head=1))
The .get_all_records
method returns a dictionary of with the column headers as the keys and a list of column values as the dictionary values. The argument head=<int>
determines which row to use as keys; rows start from 1 and follow the numeration of the spreadsheet.
Since the values returned by .get_all_records()
are lists of strings, the data frame constructor, pd.DataFrame
, will return a data frame that is all strings. To convert it to floats, we need to replace the empty strings, and the the dash-only strings ('-'
) with NA-type values, then convert to float
.
Luckily pandas DataFrame has a convenient method for replacing values .replace
. We can pass it mapping from the string we want as NAs to None, which gets converted to NaN.
import pandas as pd
data = g_sheets.get_worksheet(0).get_all_records(head=1)
na_strings_map= {
'-': None,
'': None
}
df = pd.DataFrame(data).replace(na_strings_map).astype(float)