Search code examples
pythonpandasdataframedata-cleaning

How to create a DataFrame from custom values


I am reading in a text file, on each line there are multiple values. I am parsing them based on requirements using function parse.

def parse(line):
    ......
    ......
    return line[0],line[2],line[5]

I want to create a dataframe, with each line as a row and the three returened values as columns

df = pd.DataFrame()

with open('data.txt') as f:
    for line in f:
       df.append(line(parse(line)))

When I run the above code, I get all values as a single column. Is it possible to get it in proper tabular format.


Solution

  • You shouldn't .append to DataFrame in a loop, that is very inefficient anyway. Do something like:

    colnames = ['col1','col2','col3'] # or whatever you want
    with open('data.txt') as f:
        df = pd.DataFrame([parse(l) for l in f], columns=colnames)
    

    Note, the fundamental problem is that pd.DataFrame.append expects another data-frame, and it appends the rows of that other data-frame. It interpretes a list as a bunch of single rows. So note, if you structure your list to have "rows" it would work as intended. But you shouldn't be using .append here anyway:

    In [6]: df.append([1,2,3])
    Out[6]:
       0
    0  1
    1  2
    2  3
    
    In [7]: df = pd.DataFrame()
    
    In [8]: df.append([[1, 2, 3]])
    Out[8]:
       0  1  2
    0  1  2  3