Search code examples
pythonpandasdataframecsvtext

How to read a text file and make it a dataframe using pandas


I want to read the files present in this folder - uwyo and read this as a data frame while skipping the rows in between the observation data. I want to read every observation where it starts from the keyword- pressure.

For that I thought of using pandas and then start searching for the word 'pressure', but I got the following error.

import pandas as pd
import glob
import numpy as np
import matplotlib.dates as mdates
import matplotlib.pyplot as plt

dfs = []
for fname in glob.glob('*.txt'):
    df = pd.read_csv(fname,delimiter='\s+',header=None)
ParserError: Error tokenizing data. C error: Expected 9 fields in line 5, saw 11

Is there an efficient way to do this? I want to skip the station information and all those texts present in between.


Solution

  • Try it as: pd.read_csv(fname, sep='\s+', on_bad_lines='skip', skiprows=4)

    This will read the file with a lot of trash though. Also, missing values in the txt file would appear in the wrong column.

    I would recommend trying to identify the timestamps you have available and add a column for them, as well as identifying and removing the metadata present in between each period of observations.

    This will require some pre-work on the data :D

    Edit:

    Sorry, forgot to add this as a second line above: (it will filter out most of the trash) df[pd.to_numeric(df['PRES'], errors='coerce').notnull()]