Search code examples
pythonpandascsvexport-to-csv

How to separate columns of .txt file? .txt to .csv pandas script making all of the columns into one?


I have a text file that looks like this:

      UWI/API             WELLNAME             WELLNO              LABEL                OPER               PRODFM         SURFLAT     SURFLON      BOTLAT      BOTLON   PERFTOP PERFBASE     PERFSOURCE         WELL        WELL        WELL        WELL        WELL        WELL        WELL      SYM   
                                                                                                                                                                                                              ELEV_KB        TD      COMP_DATE   SPUD_DATE     INJLIQ      INJGAS      INJWTR           
49025000510000      CH039068            3-A                 03WC2NE01           STANOLIND O&G CO    603WLCK2               43.384316 -106.275540    0.000000    0.000000  1525.0  1815.0                         4847.00     1815.00  03/20/1925  01/01/1925        0.00   857224.00  6538652.00CO2-INJ 
49025000540000      CH036906A           8-A                 08WC2NE01           STANOLIND O&G CO                           43.383111 -106.278775    0.000000    0.000000     0.0     0.0                                     1842.00  10/07/1926  01/01/1926        0.00   757266.00  2287490.00CO2-INJ 
49025000550000      CH03906B            12-A                12WC2NE01           STANOLIND O&G CO    603WLCK                43.383171 -106.272200    0.000000    0.000000  1448.0  1703.0                         4896.00     1916.00  07/23/1924  01/01/1924        0.00  1157524.00  9007039.00CO2-INJ 
49025000550000                                                                                                                                                            1448.0  1451.0                                                                                                    
49025000550000                                                                                                                                                            1700.0  1703.0                                                                                                    
49025000580000      CH03906B            19-A                19WC2NE01           STANOLIND O&G CO                           43.380695 -106.280337    0.000000    0.000000     0.0     0.0                                     1833.00  08/27/1924  01/01/1924        0.00  3602831.00  8002146.00CO2-INJ 
49025000580000                                                                                                                                                            1565.0  1833.0                                                                                                    
49025000590000      CH03906B            23-AX               23WC2NE01           STANOLIND O&G CO    603WLCK2               43.380810 -106.273800    0.000000    0.000000  1842.0  1916.0                                     1927.00  10/19/1923  01/01/1923        0.00  2673371.00  9392905.00CO2-INJ 
49025000590000                                                                                                                                                               0.0     0.0                                                                

When I run this code, the output puts all of row one into one column. I want the first column to be "UWI/API", the second column to be "WELLNAME", ect... As you can see, the data is not separated by a comma, just by spaces. The data is sort of messed up? It is not aligned. The headers should be "UWI/API" to "INJWTR."

Here is my code:

import pandas as pd

df = pd.read_csv(r'C:/Users/mmcintyre/Documents/Teresa CO2 Injectors_Spaces.txt',sep=r'\s{2,}')

df.to_csv(r'C:/Users/mmcintyre/Documents/Teresa.csv',index=None)

And the output is this:https://sru365edu-my.sharepoint.com/:x:/g/personal/mam1064_sru_edu/Ebq0tpHveAJBu710UcDt82IB23gmRs6nNrTE-MtDUgkU6Q?e=byhjuL

I highlighted the rows that aren't in the right position. I'm not sure if this even fixable.

Thanks! Morgan


Solution

  • This data seems majorly difficult to parse. I tried a number of settings with a bunch of csv parsers, but I don't think there is a clean way to parse data this messy. I went for the dirty approach. If you only need it for this file, the following should work, if you need to do this regularly, I make no promises this will always work cleanly.

    #! /usr/bin/env python
    
    import csv
    
    data = []
    
    with open('Teresa CO2 Injectors_Spaces.TXT') as file_handler:
    
        header = file_handler.readline().strip().split()
        header2 = ['' for i in range(13)] + file_handler.readline().strip().split() + ['']
    
        data.append(header)
        data.append(header2)
    
        for row in file_handler.readlines():
    
            tempRow = []
    
            UWIAPI = row[0:20]
            WELLNAME = row[20:40]
            WELLNO = row[40:60]
            LABEL = row[60:80]
            OPER = row[80:100]
            PRODFM = row[100:120]
            SURFLAT = row[120:132]
            SURFLON = row[132:144]
            BOTLAT = row[144:156]
            BOTLON = row[156:168]
            PERFTOP = row[168:176]
            PERFBASE = row[176:184]
            PERFSOURCE = row[184:200]
            ELEV_KB = row[200:220]
            TD = row[218:228]
            COMP_DATE = row[228:240]
            SPUD_DATE = row[240:252]
            INJLIQ = row[252:264]
            INJGAS = row[264:276]
            INJWTR = row[276::]
            SYM = ''
    
            tempRow = [
                UWIAPI.strip(),
                WELLNAME.strip(),
                WELLNO.strip(),
                LABEL.strip(),
                OPER.strip(),
                PRODFM.strip(),
                SURFLAT.strip(),
                SURFLON.strip(),
                BOTLAT.strip(),
                BOTLON.strip(),
                PERFTOP.strip(),
                PERFBASE.strip(),
                PERFSOURCE.strip(),
                ELEV_KB.strip(),
                TD.strip(),
                COMP_DATE.strip(),
                SPUD_DATE.strip(),
                INJLIQ.strip(),
                INJGAS.strip(),
                INJWTR.strip(),
                SYM,
            ]
    
            data.append(tempRow)
    
    with open('example.csv', 'w+') as fh:
    
        writer = csv.writer(fh, delimiter=',')
    
        for row in data:
    
            writer.writerow(row)