Search code examples
pythonpandasdatetimevalueerror

specified time data doesn't match format


i have this code to convert the date to date that I want:

df['issue_d'] = df['issue_d'].replace({'Jan-':'1-', 'Feb-':'2-', 'Mar-': '3-', 'Apr-': '4-', 'May-': '5-', 'Jun-': '6-', 'Jul-': '7-', 'Aug-':'8-', 'Sep-': '9-', 'Oct-': '10-', 'Nov-': '11-', 'Dec-': '12-'}, regex=True).apply(lambda x:dt.strptime('01-'+x,'%d-%m-%y').date())
df['issue_d'] = pd.to_datetime(df['issue_d'],  format = '%Y-%m-%d')

but when I run it, this error would appear:

ValueError                                Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_25672/2570429248.py in <module>
----> 1 df['issue_d'] = df['issue_d'].replace({'Jan-':'1-', 'Feb-':'2-', 'Mar-': '3-', 
'Apr-': '4-', 'May-': '5-', 'Jun-': '6-', 'Jul-': '7-', 'Aug-':'8-', 'Sep-': '9-', 'Oct- ': '10-', 'Nov-': '11-', 'Dec-': '12-'}, regex=True).apply(lambda x:dt.strptime('01-'+x,'%d-%m-%y').date())
  2 df['issue_d'] = pd.to_datetime(df['issue_d'],  format = '%Y-%m-%d')

~\anaconda3\lib\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, 
args, **kwargs)
4355         dtype: float64
4356         """
-> 4357         return SeriesApply(self, func, convert_dtype, args, kwargs).apply()
 4358 
 4359     def _reduce(

~\anaconda3\lib\site-packages\pandas\core\apply.py in apply(self)
 1041             return self.apply_str()
 1042 
 -> 1043         return self.apply_standard()
 1044 
 1045     def agg(self):

 ~\anaconda3\lib\site-packages\pandas\core\apply.py in apply_standard(self)
 1096                 # List[Union[Callable[..., Any], str]]]]]"; expected
 1097                 # "Callable[[Any], Any]"
 -> 1098                 mapped = lib.map_infer(
 1099                     values,
 1100                     f,  # type: ignore[arg-type]

 ~\anaconda3\lib\site-packages\pandas\_libs\lib.pyx in pandas._libs.lib.map_infer()

 ~\AppData\Local\Temp/ipykernel_25672/2570429248.py in <lambda>(x)
  ----> 1 df['issue_d'] = df['issue_d'].replace({'Jan-':'1-', 'Feb-':'2-', 'Mar-': '3-', 'Apr-': '4-', 'May-': '5-', 'Jun-': '6-', 'Jul-': '7-', 'Aug-':'8-', 'Sep-': '9-', 'Oct-': '10-', 'Nov-': '11-', 'Dec-': '12-'}, regex=True).apply(lambda x:dt.strptime('01-'+x,'%d-%m-%y').date())
  2 df['issue_d'] = pd.to_datetime(df['issue_d'],  format = '%Y-%m-%d')

  ~\anaconda3\lib\_strptime.py in _strptime_datetime(cls, data_string, format)
  566     """Return a class cls instance based on the input string and the
  567     format string."""
   --> 568     tt, fraction, gmtoff_fraction = _strptime(data_string, format)
  569     tzname, gmtoff = tt[-2:]
  570     args = tt[:6] + (fraction,)

  ~\anaconda3\lib\_strptime.py in _strptime(data_string, format)
  347     found = format_regex.match(data_string)
  348     if not found:
  --> 349         raise ValueError("time data %r does not match format %r" %
  350                          (data_string, format))
  351     if len(data_string) != found.end():

   ValueError: time data '01-15-Dec' does not match format '%d-%m-%y'

****update:

my ['issue_d'] column's info is like :

issue_d              1048563 non-null  object

that includes years-months(names) like:

15-Dec
16-Jan
and etc.

We should first: change the month's names (Jan, Feb, Mar,...) to their numbers (01,02,03,...), so the output for the column be like:

15-12
16-01
and etc.

and add day (1) to them. so that my dates arrange be like:

01-01-15
01-02-15
01-03-15
and etc.

That ((apply)) part is that day 1 that I tried to add, the second is the months, and the third is the years.

you can see in my first line that i tried to do this:

df['issue_d'] = df['issue_d'].replace({'Jan-':'1-', 'Feb-':'2-', 'Mar-': '3-', 'Apr-': '4-', 'May-': '5-', 'Jun-': '6-', 'Jul-': '7-', 'Aug-':'8-', 'Sep-': '9-', 'Oct-': '10-', 'Nov-': '11-', 'Dec-': '12-'}, regex=True).apply(lambda x:dt.strptime('01-'+x,'%d-%m-%y').date())

I changed Abbreviated month names in the first line because pandas DateTime can't figure it out and makes it like DateTime. In the second line, I tried to change the arrangement in %Y-%m-%d format and change the column to a data frame to do further work on my dataset. But unfortunately, that error appeared. I'd appreciate it if you help me. Thank you


Solution

  • Okay, next try:

    If you're not using an English locale, then you could try:

    df = pd.DataFrame({"issue_d": ["15-Dec", "16-Jan", "21-Oct"]})
    
    mapping = {"Jan": "1", "Feb": "2", "Mar": "3", "Apr": "4", "May": "5", "Jun": "6",
               "Jul": "7", "Aug": "8", "Sep": "9", "Oct": "10", "Nov": "11", "Dec": "12"}
    df["issue_d"] = pd.to_datetime(
        df["issue_d"].str[:-3] + df["issue_d"].str[-3:].replace(mapping),
        format="%y-%m"
    ).dt.strftime("%d-%m-%y")
    

    Result:

        issue_d
    0  01-12-15
    1  01-01-16
    2  01-10-21
    

    You get the first of the month automatically.

    If you are using an English locale, then this gives the same result:

    df = pd.DataFrame({"issue_d": ["15-Dec", "16-Jan", "21-Oct"]})
    
    df["issue_d"] = pd.to_datetime(df["issue_d"], format="%y-%b").dt.strftime("%d-%m-%y")
    

    Regarding your question extension: If the issued_d column needs to be datetime for further processing then remove the .dt.strftime("%d-%m-%y") at the end (because this makes strings out of the datetimes), do what you need to do, and convert it to strings later. For example

    ...
    df["issue_d"] = pd.to_datetime(
        df["issue_d"].str[:-3] + df["issue_d"].str[-3:].replace(mapping),
        format="%y-%m"
    )
    df["issue_y"] = df["issue_d"].dt.year
    df["issue_d"] = df["issue_d"].dt.strftime("%d-%m-%y") 
    

    results in

        issue_d  issue_y
    0  01-12-15     2015
    1  01-01-16     2016
    2  01-10-21     2021