I want to read the files present in this folder - uwyo and read this as a data frame while skipping the rows in between the observation data. I want to read every observation where it starts from the keyword- pressure.
For that I thought of using pandas and then start searching for the word 'pressure', but I got the following error.
import pandas as pd
import glob
import numpy as np
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
dfs = []
for fname in glob.glob('*.txt'):
df = pd.read_csv(fname,delimiter='\s+',header=None)
ParserError: Error tokenizing data. C error: Expected 9 fields in line 5, saw 11
Is there an efficient way to do this? I want to skip the station information and all those texts present in between.
Try it as: pd.read_csv(fname, sep='\s+', on_bad_lines='skip', skiprows=4)
This will read the file with a lot of trash though. Also, missing values in the txt file would appear in the wrong column.
I would recommend trying to identify the timestamps you have available and add a column for them, as well as identifying and removing the metadata present in between each period of observations.
This will require some pre-work on the data :D
Edit:
Sorry, forgot to add this as a second line above: (it will filter out most of the trash) df[pd.to_numeric(df['PRES'], errors='coerce').notnull()]