python python-3.x dataframe python-datetime

Getting error "ValueError: time data '' does not match format '%Y-%m-%d %H:%M:%S'"

Here is a sample of the df:

pId tPS                 tLL                 dZ
129 2019-12-02 15:04:09 2019-12-02 15:06:31 5f723
129 2019-12-02 15:04:15 2019-12-02 15:06:37 5f723
129 2019-12-02 15:05:15 2019-12-02 15:07:37 5f723
129 2019-12-02 15:05:18 2019-12-02 15:07:40 5f723
129 2019-12-02 15:05:24 2019-12-02 15:07:46 5f723

The pID is the ID of a person and I am trying to check the entry, exit and duration time for each ID.

Here is the code:

from datetime import datetime
stats=df.sort_values(by=['pId', 'tPS', 'tLL'])[['pId', 'tPS', 'tLL', 'dZ']]
pid = ''
enter_t = ''
exit_t = ''

enter_exit_times=[]

for ind, row in stats.iterrows():

    if pid =='':
        enter_t = row['tPS']
        print(enter_t)

    if row['pId']!= pid or ((datetime.strftime(row['tLL'], "%Y-%m-%d %H:%M:%S") 
                         - datetime.strftime(exit_t, "%Y-%m-%d %H:%M:%S")).total_seconds()>2*60*60):

    duration = (datetime.strptime(exit_t, "%Y-%m-%d %H:%M:%S") -
                datetime.strptime(enter_t, "%Y-%m-%d %H:%M:%S"))

    enter_exit_times.append([pid, enter_t, exit_t, duration.total_seconds()])

    pid = row['pId']

    enter_t = row['tPS']

enter_exit_times.append([pid, enter_t, exit_t])
enter_exit_times_df = pd.DataFrame(enter_exit_times)

So here

pid is the id
enter_t is the entering time
exit_t is the exit time
tPS is the in time
tLL is the out time.

I am then creating a list for which I am writing a loop below. Initially, I run it through a for loop where I iterate through the rows of the data frame. So there are two if loop, one with pid where an empty value means it needs to take the row[tPS] and if not then it has to run through the not loop. Then I am calculating the duration and then appending the values to the entry-exit times.

I am getting this error:

2019-12-02 15:04:09
---------------------------------------------------------------------------
ValueError                           Traceback (most recent callast)
<ipython-input-411-fd8f6f998cc8> in <module>
12     if row['pId']!= pid or ((datetime.strftime(row['tLL'], "%Y-%m-%d %H:%M:%S") 
13                              - datetime.strftime(exit_t, "%Y-%m-%d %H:%M:%S")).total_seconds()>2*60*60):
---> 14         duration = (datetime.strptime(exit_t, "%Y-%m-%d %H:%M:%S") -
15                     datetime.strptime(enter_t, "%Y-%m-%d %H:%M:%S"))
16         enter_exit_times.append([pid, enter_t, exit_t, duration.total_seconds()])

~/opt/anaconda3/lib/python3.7/_strptime.py in _strptime_datetime(cls, data_string, format)
575     """Return a class cls instance based on the input string and the
576     format string."""
--> 577     tt, fraction, gmtoff_fraction = _strptime(data_string, format)
578     tzname, gmtoff = tt[-2:]
579     args = tt[:6] + (fraction,)

~/opt/anaconda3/lib/python3.7/_strptime.py in _strptime(data_string, format)
357     if not found:
358         raise ValueError("time data %r does not match format %r" %
--> 359                          (data_string, format))
360     if len(data_string) != found.end():
361         raise ValueError("unconverted data remains: %s" %

**ValueError: time data '' does not match format '%Y-%m-%d %H:%M:%S'**

Solution

The cause of the error is that exit_t is not set anywhere in the loop. It is an empty string. You set it before the loop to exit_t = '' but then it's never set again. That's why strptime throws the error here:

>>> datetime.strptime(' ', "%Y-%m-%d %H:%M:%S")
Traceback (most recent call last):
...
  File "/usr/local/Cellar/python/3.7.6/Frameworks/Python.framework/Versions/3.7/lib/python3.7/_strptime.py", line 359, in _strptime
    (data_string, format))
ValueError: time data ' ' does not match format '%Y-%m-%d %H:%M:%S'

The solution is to simply set it properly to "tLL" (if I understand you correctly).

But I would like to go further and say that I think you are making the code much much more complicated that how it should be. My understanding is that you just want to compute the time duration between "tPS" (the in time) and "tLL" (the out time). Since you are already iterating over each row, you just need to assign the values appropriately

pid = row['pId']

enter_t_str = row['tPS']  # strings
exit_t_str = row['tLL']   # strings

then convert the datetime strings to datetime objects using strptime

enter_t_dt = datetime.strptime(enter_t_str, "%Y-%m-%d %H:%M:%S")
exit_t_dt = datetime.strptime(exit_t_str, "%Y-%m-%d %H:%M:%S")

then calculate the duration

duration = exit_t_dt - enter_t_dt

then finally append it to your list

enter_exit_times.append([pid, enter_t_str, exit_t_str, duration.total_seconds()])

There is no need to keep track of the "pId".

Here's the full code:

stats = df.sort_values(by=['pId', 'tPS', 'tLL'])[['pId', 'tPS', 'tLL', 'dZ']]

pid = ''
enter_t = ''
exit_t = ''
enter_exit_times = []

for ind, row in stats.iterrows():
    pid = row['pId']

    enter_t_str = row['tPS']
    exit_t_str = row['tLL']

    enter_t_dt = datetime.strptime(enter_t_str, "%Y-%m-%d %H:%M:%S")
    exit_t_dt = datetime.strptime(exit_t_str, "%Y-%m-%d %H:%M:%S")
    duration = exit_t_dt - enter_t_dt

    enter_exit_times.append([pid, enter_t_str, exit_t_str, duration.total_seconds()])

enter_exit_times_df = pd.DataFrame(enter_exit_times)
print(enter_exit_times_df)

And the output DataFrame:

     0                    1                    2      3
0  129  2019-12-02 15:04:09  2019-12-02 15:06:31  142.0
1  129  2019-12-02 15:04:15  2019-12-02 15:06:37  142.0
2  129  2019-12-02 15:05:15  2019-12-02 15:07:37  142.0
3  129  2019-12-02 15:05:18  2019-12-02 15:07:40  142.0
4  129  2019-12-02 15:05:24  2019-12-02 15:07:46  142.0

If you want to only get the enter/exit times for a particular time period of a day, you could create the datetime objects for the start and end times, and do regular comparison:

>>> dt_beg = datetime(2019,12,2,8,0,0)   #8AM
>>> dt_beg
datetime.datetime(2019, 12, 2, 8, 0)
>>> dt_end = datetime(2019,12,2,10,0,0)  #10AM
>>> dt_end
datetime.datetime(2019, 12, 2, 10, 0)
>>> dt = datetime(2019,12,2,9,34,0)      #9:34AM
>>> dt_beg < dt < dt_end
True
>>> dt = datetime(2019,12,2,14,34,0)     #2:34PM
>>> dt_beg < dt < dt_end
False

So you could add a filter for what to append to enter_exit_times:

if (enter_t_dt > start_dt and exit_t_dt < end_dt):
    enter_exit_times.append(...)