Search code examples
pythonpandasdataframetextseparator

Load .txt files in Pandas DataFrame with separator line in between text.


I have text file which contains text like this:

--------------------------------
I hate apples and love oranges.
He likes to ride bike.
--------------------------------

--------------------------------
He is a man of honour. 
She loves to travel.
--------------------------------

I want to load this txt file in pandas dataframe and each row containing the content only between the separator. For e.g:

Row 1 should be like: I hate apples and love oranges. He likes to ride bike.

Row 2 should be like: He is a man of honour. She loves to travel.


Solution

  • Looks like you need to pre-process the text.

    Try:

    import pandas as pd
    res = []
    temp = []
    with open(filename) as infile:
        for line in infile:
            val = line.strip()
            if val:        
                if not val.startswith("-"):
                    temp.append(val)
                else:
                    if temp:
                        res.append(" ".join(temp))
                        temp = []
    
    df = pd.DataFrame(res, columns=["Test"])
    print(df)
    

    Output:

                                                    Test
    0  I hate apples and love oranges. He likes to ri...
    1        He is a man of honour. She loves to travel.