I have a load of ASC files to extract data from. The issue I am having is that some of the columns have empty rows where there is no data, when I load these files into a df - it populates the first columns with all the data and just adds nans to the end... like this:
a| b| c
1 | 2 | nan
when I want it to be:
a | b | c
1 |nan|2
(I can't figure out how to make a table here to save my life) but where there is no data I want it to preserve the space. Part of my code says the separator is any space with more than two white spaces so I can preserve the headers that have one space within them, I think this is causing the issue but I am not sure how to fix it. I've tried using astropy.io to open the files and determine the delimiter but I get the error that the number of columns doesn't match the data columns.
here's an image of the general look of the files I have so you can see the lack of char delimiters and empty columns.
starting_words = ['Core no.', 'Core No.','Core','Core no.']
data = []
file_paths = []
for file in filepaths:
with open(file) as f:
for i, l in enumerate(f):
if l.startswith(tuple(starting_words)):
df = (pd.read_csv(file,sep = '\\s{2,}', engine = 'python', skiprows = i))
file_paths.append((file.stem + file.suffix))
df.insert(0,'Filepath', file)
data += [df]
break
this is the script that I've used to open the files and keep the header words together, I never got the astropy stuff to run - I either get the columns dont match error or it could not determine the file format.Also, this code has the skiprows part because the files all have random notes at the top that I don't want in my dataframe.
Your data looks well behaved, you could try to make use of the Pandas fwf
to read the files with fixed-width formatted lines. If the inference from the fwf
is not good enough for you, you can manually describe the extents of the fixed-width fields of each line using the parameter colspecs
.
Sample
Core no. Depth Depth Perm Porosity Saturations Oil
ft m mD % %
1 5516.0 1681.277 40.0 1.0
2 5527.0 1684.630 39.0 16.0
3 5566.0 1696.517 508 37.0 4.0
5571.0 1698.041 105 33.0 8.0
6 5693.0 1735.226 44.0 16.0
5702.0 1737.970 4320 35.0 31.0
9 5686.0 1733.093 2420 33.0 26.0
df = pd.read_fwf('sample.txt', skiprows=2, header=None)
df.columns=['Core no.', 'Depth ft', 'Depth m' , 'Perm mD', 'Porosity%', 'Saturations Oil%']
print(df)
Output from df
Core no. Depth ft Depth m Perm mD Porosity% Saturations Oil%
0 1.0 5516.0 1681.277 NaN 40.0 1.0
1 2.0 5527.0 1684.630 NaN 39.0 16.0
2 3.0 5566.0 1696.517 508.0 37.0 4.0
3 NaN 5571.0 1698.041 105.0 33.0 8.0
4 6.0 5693.0 1735.226 NaN 44.0 16.0
5 NaN 5702.0 1737.970 4320.0 35.0 31.0
6 9.0 5686.0 1733.093 2420.0 33.0 26.0