Search code examples
pythonpandaspython-datetimeconvertersstrptime

Handle NaN values (zero value) in datetime.strptime in a converter used in pd.read_fwf


I have a source file which is length-delimited. My file looks as follows:

00;12345678;03122019
01;12345678;00000000

My code so far is as follows:

import pandas as pd
from datetime import datetime

col_lengths = {'Column1': range(0, 2), 
               'Column2': range(3, 11), 
               'Datecolumn': range(12, 20),
              }
col_lengths = {k: set(v) for k, v in col_lengths.items()}

pd.read_fwf(r'D:\test.txt', colspecs=[(min(x), max(x)+1) for x in col_lengths.values()], header=None, names=col_lengths.keys(), 
           converters={'Column1':lambda x : str(x),
                       'Column2':lambda x : str(x),
                       'Datecolumn': lambda s: datetime.strptime(s, '%d%m%Y'),
                      }
           )

Problem here is that with the second line of my file I get an error:

ValueError: time data '00000000' does not match format '%d%m%Y'

What is the correct way / a good way to handle this? I have no experience so far with this. The only alternative for me would be to take it as a string and later on try in a pandas dataframe to convert it. But if there would be an elegant way to handle this directly here, I would like to use it. I wondered how I could handle it here on the fly and if this would be a good idea?


Solution

  • You could use pandas.to_datetime as converter:

    #converters={...
                 'Datecolumn': lambda s: pd.to_datetime(s, format='%d%m%Y', errors='coerce'),
    #            }
    

    output:

      Column1   Column2 Datecolumn
    0      00  12345678 2019-12-03
    1      01  12345678        NaT
    
    alternative

    read the column as str and convert afterwards:

    df['Datecolumn'] = pd.to_datetime(df['Datecolumn'], format='%d%m%Y', errors='coerce')