Search code examples
pythonpandaslistdataframeexport-to-csv

How to make a dataframe from a list of strings?


I have multiple files .csv, containing the results of training/validating process. One file per model. Each line in a file contains the following information: Epoch,loss_train,acc_train,loss_val,acc_val,time. Each filename contains information on model parameters.

I need to construct a dataframe that contains the last line of each file and the filename. Using readlines() in Python, I managed to get a list of strings with the desired information. Example of the part of the list:

'"0_0_2_200_0.4.csv",66,67,0.42319968342781067,0.8733666720438781,0.9848468899726868,0.7532656023222061,0.2503340244293213\n', '"0_0_2_200_0.5.csv",74,75,0.41233333945274353,0.8760283916760768,0.9206098318099976,0.7656023222060958,0.2535388469696045\n',

How can I put this list into dataframe?

To transform this list of lists into dataframe, I tried:

df = pd.DataFrame.from_records(results, columns = ["filename", "row_number", "Epoch", "loss_train", "acc_train","loss_val", "acc_val", "time"])

I've got the following error: ValueError: 8 columns passed, passed data had 124 columns. I tried:

df2 = pd.DataFrame(results,
                   columns=["filename", "row_number", "Epoch", "loss_train", "acc_train","loss_val", "acc_val", "time"])

The error is: ValueError: Shape of passed values is (110, 1), indices imply (110, 8).

df3 = pd.DataFrame(results)

results into a dataframe with only one column. I tried to write the list into a csv file:

file = open('final_results.csv', 'w+', newline ='') 
with file:     
    write = csv.writer(file, delimiter=',') 
    write.writerows(results)

But in the resulting file, all the characters are split into different columns:

"""",0,_,0,_,2,_,2,0,0,_,0,.,4,.,c,s,v,"""",",",6,6,",",6,7,",",0,.,4,2,3,1,9,9,6,8,3,4,2,7,8,1,0,6,7,",",0,.,8,7,3,3,6,6,6,7,2,0,4,3,8,7,8,1,",",0,.,9,8,4,8,4,6,8,8,9,9,7,2,6,8,6,8,",",0,.,7,5,3,2,6,5,6,0,2,3,2,2,2,0,6,1,",",0,.,2,5,0,3,3,4,0,2,4,4,2,9,3,2,1,3,"

Solution

  • With the list of strings you provided:

    results = [
        '"0_0_2_200_0.4.csv",66,67,0.42319968342781067,0.8733666720438781,0.9848468899726868,0.7532656023222061,0.2503340244293213\n',
        '"0_0_2_200_0.5.csv",74,75,0.41233333945274353,0.8760283916760768,0.9206098318099976,0.7656023222060958,0.2535388469696045\n',
    ]
    

    Here is one way to do it using Python str.strip and str.split:

    import pandas as pd
    
    df = pd.DataFrame(
        [x.strip("\n").split(",") for x in results],
        columns=[
            "filename",
            "row_number",
            "Epoch",
            "loss_train",
            "acc_train",
            "loss_val",
            "acc_val",
            "time",
        ],
    )
    

    Then:

                  filename row_number Epoch           loss_train  \
    0  "0_0_2_200_0.4.csv"         66    67  0.42319968342781067   
    1  "0_0_2_200_0.5.csv"         74    75  0.41233333945274353   
    
                acc_train            loss_val             acc_val  \
    0  0.8733666720438781  0.9848468899726868  0.7532656023222061   
    1  0.8760283916760768  0.9206098318099976  0.7656023222060958   
    
                     time  
    0  0.2503340244293213  
    1  0.2535388469696045