I just got a csv file that I want to load the dataset as dataframe using pandas. However, I'm kinda confused this data format.
Here is the sample of data for two lines:
Name=John, Gender=M, BloodType=A, Location=New York, Age=18
Name=Mary, Gender=F, BloodType=AB, Location=Seatle, Age=30
How do I load this dataset into dataframe with columns (Name, Gender, BloodType,...etc)?
I will appreciate that someone gives me tips to start with!
Use read_csv
with header=None
first:
import pandas as pd
temp=u"""Name=John,Gender=M,BloodType=A,Location=New York,Age=18
Name=Mary,Gender=F,BloodType=AB,Location=Seatle,Age=30"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), header=None)
print (df)
0 1 2 3 4
0 Name=John Gender=M BloodType=A Location=New York Age=18
1 Name=Mary Gender=F BloodType=AB Location=Seatle Age=30
Then DataFrame.apply
with Series.str.split
and select second lists, last change columns names:
df1 = df.apply(lambda x: x.str.split('=').str[1])
df1.columns = df.iloc[0].str.split('=').str[0].rename(None)
#if necessary
df1['Age'] = df1['Age'].astype(int)
print (df1)
Name Gender BloodType Location Age
0 John M A New York 18
1 Mary F AB Seatle 30