Search code examples
pythonpython-3.xpandasdataframedata-cleaning

Python pandas to remove header and footer records if detected before converting to dataframe


I am trying to clean my data before processing it. Below is a sample of my data in .txt file format

Input

20210320
Name|email|age|books|return-date
yuzu|[email protected]|23|2020-09-23
apple|[email protected]|27|2021-01-34
Total generated records: 2 rows 
Exported on 2021-03-20

Output - will be converted to dataframe by splitting with |

Name|email|age|books|return-date
yuzu|[email protected]|23|2020-09-23
apple|[email protected]|27|2021-01-34

How do i remove the header and footer records using python? As i will be processing multiple files and the header and footer might not always appear, the codes need to be able to detect and remove if it is in the text file.

Any help is appreciated!


Solution

  • Considering the fact that required rows contain '|' delimiter, you can use this code:

    f = open("filename.txt",'r',encoding = 'utf-8')
    for line in f:
        if '|' not in line:
            continue
        else:
            print(line)
    

    Output

    Name|email|age|books|return-date
    
    yuzu|[email protected]|23|2020-09-23
    
    apple|[email protected]|27|2021-01-34
    

    If you want to remove the new line after each line you can use print(line.strip())

    Output

    Name|email|age|books|return-date
    yuzu|[email protected]|23|2020-09-23
    apple|[email protected]|27|2021-01-34