I have a text file that looks like this:
UWI/API WELLNAME WELLNO LABEL OPER PRODFM SURFLAT SURFLON BOTLAT BOTLON PERFTOP PERFBASE PERFSOURCE WELL WELL WELL WELL WELL WELL WELL SYM
ELEV_KB TD COMP_DATE SPUD_DATE INJLIQ INJGAS INJWTR
49025000510000 CH039068 3-A 03WC2NE01 STANOLIND O&G CO 603WLCK2 43.384316 -106.275540 0.000000 0.000000 1525.0 1815.0 4847.00 1815.00 03/20/1925 01/01/1925 0.00 857224.00 6538652.00CO2-INJ
49025000540000 CH036906A 8-A 08WC2NE01 STANOLIND O&G CO 43.383111 -106.278775 0.000000 0.000000 0.0 0.0 1842.00 10/07/1926 01/01/1926 0.00 757266.00 2287490.00CO2-INJ
49025000550000 CH03906B 12-A 12WC2NE01 STANOLIND O&G CO 603WLCK 43.383171 -106.272200 0.000000 0.000000 1448.0 1703.0 4896.00 1916.00 07/23/1924 01/01/1924 0.00 1157524.00 9007039.00CO2-INJ
49025000550000 1448.0 1451.0
49025000550000 1700.0 1703.0
49025000580000 CH03906B 19-A 19WC2NE01 STANOLIND O&G CO 43.380695 -106.280337 0.000000 0.000000 0.0 0.0 1833.00 08/27/1924 01/01/1924 0.00 3602831.00 8002146.00CO2-INJ
49025000580000 1565.0 1833.0
49025000590000 CH03906B 23-AX 23WC2NE01 STANOLIND O&G CO 603WLCK2 43.380810 -106.273800 0.000000 0.000000 1842.0 1916.0 1927.00 10/19/1923 01/01/1923 0.00 2673371.00 9392905.00CO2-INJ
49025000590000 0.0 0.0
When I run this code, the output puts all of row one into one column. I want the first column to be "UWI/API", the second column to be "WELLNAME", ect... As you can see, the data is not separated by a comma, just by spaces. The data is sort of messed up? It is not aligned. The headers should be "UWI/API" to "INJWTR."
Here is my code:
import pandas as pd
df = pd.read_csv(r'C:/Users/mmcintyre/Documents/Teresa CO2 Injectors_Spaces.txt',sep=r'\s{2,}')
df.to_csv(r'C:/Users/mmcintyre/Documents/Teresa.csv',index=None)
And the output is this:https://sru365edu-my.sharepoint.com/:x:/g/personal/mam1064_sru_edu/Ebq0tpHveAJBu710UcDt82IB23gmRs6nNrTE-MtDUgkU6Q?e=byhjuL
I highlighted the rows that aren't in the right position. I'm not sure if this even fixable.
Thanks! Morgan
This data seems majorly difficult to parse. I tried a number of settings with a bunch of csv parsers, but I don't think there is a clean way to parse data this messy. I went for the dirty approach. If you only need it for this file, the following should work, if you need to do this regularly, I make no promises this will always work cleanly.
#! /usr/bin/env python
import csv
data = []
with open('Teresa CO2 Injectors_Spaces.TXT') as file_handler:
header = file_handler.readline().strip().split()
header2 = ['' for i in range(13)] + file_handler.readline().strip().split() + ['']
data.append(header)
data.append(header2)
for row in file_handler.readlines():
tempRow = []
UWIAPI = row[0:20]
WELLNAME = row[20:40]
WELLNO = row[40:60]
LABEL = row[60:80]
OPER = row[80:100]
PRODFM = row[100:120]
SURFLAT = row[120:132]
SURFLON = row[132:144]
BOTLAT = row[144:156]
BOTLON = row[156:168]
PERFTOP = row[168:176]
PERFBASE = row[176:184]
PERFSOURCE = row[184:200]
ELEV_KB = row[200:220]
TD = row[218:228]
COMP_DATE = row[228:240]
SPUD_DATE = row[240:252]
INJLIQ = row[252:264]
INJGAS = row[264:276]
INJWTR = row[276::]
SYM = ''
tempRow = [
UWIAPI.strip(),
WELLNAME.strip(),
WELLNO.strip(),
LABEL.strip(),
OPER.strip(),
PRODFM.strip(),
SURFLAT.strip(),
SURFLON.strip(),
BOTLAT.strip(),
BOTLON.strip(),
PERFTOP.strip(),
PERFBASE.strip(),
PERFSOURCE.strip(),
ELEV_KB.strip(),
TD.strip(),
COMP_DATE.strip(),
SPUD_DATE.strip(),
INJLIQ.strip(),
INJGAS.strip(),
INJWTR.strip(),
SYM,
]
data.append(tempRow)
with open('example.csv', 'w+') as fh:
writer = csv.writer(fh, delimiter=',')
for row in data:
writer.writerow(row)