I can't figure out how to create pandas dataframe (multiple-columns) from list. Some lines contains character ">" at the beggining. I want them to be column headers. Number of lines after each header is not the same.
My list:
>header
a
b
>header2
c
d
e
f
>header3
g
h
i
Dataframe I want to create:
>header1 >header2 >header3
a c g
b d h
e i
f
Simply iterate through lines and match the headers with '>'. The challenge though is to create a df from a dictionary of lists with unequal size.
# The given list
lines = [">header", "a", "b", ">header2", "c", "d", "e", "f", ">header3", "g", "h", "i"]
# Iterate through the lines and create a sublist for each header
data = {}
column = ''
for line in lines:
if line.startswith('>'):
column = line
data[column] = []
continue
data[column].append(line)
# Create the DataFrame
df = pd.DataFrame.from_dict(data,orient='index').T
output:
>header >header2 >header3
0 a c g
1 b d h
2 None e i
3 None f None