Search code examples
pandasdatetimekeyerror

Pandas KeyError Date when date is a string


I have a tab separated .txt file with several hundred thousand rows. One of the columns is 'date' and the date format is '14JAN2020' which is non-standard. I'm trying to convert this to a standard datetime. My code so far:

import pandas as pd
import datetime as dt
import numpy as np
import matplotlib.pyplot as plt

with open('C:/Path/to/file/myfile.txt') as completionFile:
     completionFile.read()

df = pd.read_csv('C:/Path/to/file/myfile.txt', sep='\t', header=0)
df.head()

job-id | employee_id | date | job_type | job_time

1234 | ABCD | 14JAN2020 | foo | bar

df["date"] = df['date'].str.replace(r'^((?:\D*\d){2})', r'\1-')
df["date"] = df['date'].str.replace(r'^((?:[^a-zA-Z0-9]*[a-zA-Z0-9]){5})(?=.+)', r'\1-')
df["date"] =pd.to_datetime(df['date'])

When I do this, I get KeyError: 'date'. I have 'date' as a column header and not as an index so struggling with why I'm getting this error here


Solution

  • Coerce the date to datetime using pd.to_datetime and state the datetime format.

       print(df)
        job-id employee_id       date        job_type job_time
    0    1234        ABCD        14JAN2020      foo      bar
    
    
    
    
    df['date']=pd.to_datetime(df['date'])
    #df['date']=pd.to_datetime(df['date'], format='%Y%m%d')
    print(df)
         job-id employee_id       date           job_type job_time
    0    1234        ABCD         2020-01-14      foo      bar