Search code examples
python-3.xpandasopencsv

How to read csv file into dataframe using pandas


I just got a csv file that I want to load the dataset as dataframe using pandas. However, I'm kinda confused this data format.

Here is the sample of data for two lines:

Name=John, Gender=M, BloodType=A, Location=New York, Age=18
Name=Mary, Gender=F, BloodType=AB, Location=Seatle, Age=30

How do I load this dataset into dataframe with columns (Name, Gender, BloodType,...etc)?

I will appreciate that someone gives me tips to start with!


Solution

  • Use read_csv with header=None first:

    import pandas as pd
    
    temp=u"""Name=John,Gender=M,BloodType=A,Location=New York,Age=18
    Name=Mary,Gender=F,BloodType=AB,Location=Seatle,Age=30"""
    #after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
    df = pd.read_csv(pd.compat.StringIO(temp), header=None)
    print (df)
               0         1             2                  3       4
    0  Name=John  Gender=M   BloodType=A  Location=New York  Age=18
    1  Name=Mary  Gender=F  BloodType=AB    Location=Seatle  Age=30
    

    Then DataFrame.apply with Series.str.split and select second lists, last change columns names:

    df1 = df.apply(lambda x: x.str.split('=').str[1])
    df1.columns = df.iloc[0].str.split('=').str[0].rename(None)
    #if necessary
    df1['Age'] = df1['Age'].astype(int)
    print (df1)
       Name Gender BloodType  Location  Age
    0  John      M         A  New York   18
    1  Mary      F        AB    Seatle   30