Search code examples
pandassklearn-pandas

how to put first value in one column and remaining into other column?


ROCO2_CLEF_00001.jpg,C3277934,C0002978 ROCO2_CLEF_00002.jpg,C3265939,C0002942,C2357569

I want to make a pandas data frame from csv file. I want to put first row entry(filename) into a column and give the column/header name "filenames", and remaining entries into another column name "class". How to do so?


Solution

  • in case your file hasn't a fixed number of commas per row, you could do the following:

    import pandas as pd
    
    csv_path = 'test_csv.csv'
    raw_data = open(csv_path).readlines()
    
    # clean rows
    raw_data = [x.strip().replace("'", "") for x in raw_data]
    print(raw_data)
    
    # make split between data
    raw_data = [ [x.split(",")[0], ','.join(x.split(",")[1:])] for x in raw_data]
    print(raw_data)
    
    # build the pandas Dataframe
    column_names = ["filenames", "class"]
    temp_df = pd.DataFrame(data=raw_data, columns=column_names)
    
    print(temp_df)
    
                  filenames                       class
    0  ROCO2_CLEF_00001.jpg           C3277934,C0002978
    1  ROCO2_CLEF_00002.jpg  C3265939,C0002942,C2357569