Search code examples
pandaslistmultiple-columns

Create multiple-columns pandas dataframe from list


I can't figure out how to create pandas dataframe (multiple-columns) from list. Some lines contains character ">" at the beggining. I want them to be column headers. Number of lines after each header is not the same.

My list:

>header
a
b
>header2
c
d
e
f
>header3
g
h
i

Dataframe I want to create:

>header1   >header2   >header3
a           c          g
b           d          h
            e          i
            f

Solution

  • Simply iterate through lines and match the headers with '>'. The challenge though is to create a df from a dictionary of lists with unequal size.

    # The given list
    lines = [">header", "a", "b", ">header2", "c", "d", "e", "f", ">header3", "g", "h", "i"]
    
    # Iterate through the lines and create a sublist for each header
    data = {}
    column = ''
    for line in lines:
        if line.startswith('>'):
            column = line
            data[column] = []
            continue
        data[column].append(line)
    
    # Create the DataFrame
    df = pd.DataFrame.from_dict(data,orient='index').T
    

    output:

      >header >header2 >header3
    0       a        c        g
    1       b        d        h
    2    None        e        i
    3    None        f     None