I am reading in a text file, on each line there are multiple values. I am parsing them based on requirements using function parse.
def parse(line):
......
......
return line[0],line[2],line[5]
I want to create a dataframe, with each line as a row and the three returened values as columns
df = pd.DataFrame()
with open('data.txt') as f:
for line in f:
df.append(line(parse(line)))
When I run the above code, I get all values as a single column. Is it possible to get it in proper tabular format.
You shouldn't .append
to DataFrame
in a loop, that is very inefficient anyway. Do something like:
colnames = ['col1','col2','col3'] # or whatever you want
with open('data.txt') as f:
df = pd.DataFrame([parse(l) for l in f], columns=colnames)
Note, the fundamental problem is that pd.DataFrame.append
expects another data-frame, and it appends the rows of that other data-frame. It interpretes a list as a bunch of single rows. So note, if you structure your list to have "rows" it would work as intended. But you shouldn't be using .append
here anyway:
In [6]: df.append([1,2,3])
Out[6]:
0
0 1
1 2
2 3
In [7]: df = pd.DataFrame()
In [8]: df.append([[1, 2, 3]])
Out[8]:
0 1 2
0 1 2 3