Search code examples
pythonpandaslas

Reading LAS file using lasio library doesn't handle "-" in datetime column


I have a las file and I am trying to read it in python using lasio library, one of the columns is TIME which is in the following format: 00:00:00.22-04-23

Sample of data copied from las file:

TIME               col1 col2
00:00:00.22-06-23  1010  20
00:00:05.22-06-23  1020  25
00:00:10.22-06-23  1015  32

My code to read the data:

df = lasio.read(file_path).df().reset_index()

This returns the df in the following format:

TIME               col1 col2 UNKNOWN:1  UNKNOWN:2
00:00:00.22         -06 -23    1010       20
00:00:05.22         -06 -23    1020       25
00:00:10.22         -06 -23    1015       32

As you can see, my TIME column has been split into three columns at every -. The data from col1 and col2 have been shifted to UNKNOWN:1 and UNKNOWN:2 (probably these columns are created by lasio during reading). I need it to return the TIME column as in the original form and avoid shifting the values of col1 and col2, so I can strip, split and manipulate TIME using pandas once it is read into a dataframe.

Any advice is appreciated.


Solution

  • You can try to use pd.read_csv with correct delimiter. For example:

    df = pd.read_csv('your_file.txt', sep=r"\s+", engine="python")
    print(df)
    

    Prints:

                    TIME  col1  col2
    0  00:00:00.22-06-23  1010    20
    1  00:00:05.22-06-23  1020    25
    2  00:00:10.22-06-23  1015    32
    

    EDIT: With updated file:

    import re
    import pandas as pd
    from io import StringIO
    
    with open('your_file.txt', 'r') as f_in:
        data = re.sub(r'\A.*~A', '', f_in.read(), count=1, flags=re.S)
        df = pd.read_csv(StringIO(data), sep=r"\s+", engine="python")
    
    print(df)
    

    Prints:

                    TIME     col1  col2   col3
    0  00:00:00.23-04-23  1977.47   160  160.5