Here is a sample of the df:
pId tPS tLL dZ
129 2019-12-02 15:04:09 2019-12-02 15:06:31 5f723
129 2019-12-02 15:04:15 2019-12-02 15:06:37 5f723
129 2019-12-02 15:05:15 2019-12-02 15:07:37 5f723
129 2019-12-02 15:05:18 2019-12-02 15:07:40 5f723
129 2019-12-02 15:05:24 2019-12-02 15:07:46 5f723
The pID is the ID of a person and I am trying to check the entry, exit and duration time for each ID.
Here is the code:
from datetime import datetime
stats=df.sort_values(by=['pId', 'tPS', 'tLL'])[['pId', 'tPS', 'tLL', 'dZ']]
pid = ''
enter_t = ''
exit_t = ''
enter_exit_times=[]
for ind, row in stats.iterrows():
if pid =='':
enter_t = row['tPS']
print(enter_t)
if row['pId']!= pid or ((datetime.strftime(row['tLL'], "%Y-%m-%d %H:%M:%S")
- datetime.strftime(exit_t, "%Y-%m-%d %H:%M:%S")).total_seconds()>2*60*60):
duration = (datetime.strptime(exit_t, "%Y-%m-%d %H:%M:%S") -
datetime.strptime(enter_t, "%Y-%m-%d %H:%M:%S"))
enter_exit_times.append([pid, enter_t, exit_t, duration.total_seconds()])
pid = row['pId']
enter_t = row['tPS']
enter_exit_times.append([pid, enter_t, exit_t])
enter_exit_times_df = pd.DataFrame(enter_exit_times)
So here
pid
is the identer_t
is the entering timeexit_t
is the exit timetPS
is the in timetLL
is the out time.I am then creating a list for which I am writing a loop below. Initially, I run it through a for
loop where I iterate through the rows of the data frame. So there are two if
loop, one with pid
where an empty value means it needs to take the row[tPS]
and if not then it has to run through the not loop. Then I am calculating the duration and then appending the values to the entry-exit times.
I am getting this error:
2019-12-02 15:04:09
---------------------------------------------------------------------------
ValueError Traceback (most recent callast)
<ipython-input-411-fd8f6f998cc8> in <module>
12 if row['pId']!= pid or ((datetime.strftime(row['tLL'], "%Y-%m-%d %H:%M:%S")
13 - datetime.strftime(exit_t, "%Y-%m-%d %H:%M:%S")).total_seconds()>2*60*60):
---> 14 duration = (datetime.strptime(exit_t, "%Y-%m-%d %H:%M:%S") -
15 datetime.strptime(enter_t, "%Y-%m-%d %H:%M:%S"))
16 enter_exit_times.append([pid, enter_t, exit_t, duration.total_seconds()])
~/opt/anaconda3/lib/python3.7/_strptime.py in _strptime_datetime(cls, data_string, format)
575 """Return a class cls instance based on the input string and the
576 format string."""
--> 577 tt, fraction, gmtoff_fraction = _strptime(data_string, format)
578 tzname, gmtoff = tt[-2:]
579 args = tt[:6] + (fraction,)
~/opt/anaconda3/lib/python3.7/_strptime.py in _strptime(data_string, format)
357 if not found:
358 raise ValueError("time data %r does not match format %r" %
--> 359 (data_string, format))
360 if len(data_string) != found.end():
361 raise ValueError("unconverted data remains: %s" %
**ValueError: time data '' does not match format '%Y-%m-%d %H:%M:%S'**
The cause of the error is that exit_t
is not set anywhere in the loop. It is an empty string. You set it before the loop to exit_t = ''
but then it's never set again. That's why strptime
throws the error here:
>>> datetime.strptime(' ', "%Y-%m-%d %H:%M:%S")
Traceback (most recent call last):
...
File "/usr/local/Cellar/python/3.7.6/Frameworks/Python.framework/Versions/3.7/lib/python3.7/_strptime.py", line 359, in _strptime
(data_string, format))
ValueError: time data ' ' does not match format '%Y-%m-%d %H:%M:%S'
The solution is to simply set it properly to "tLL"
(if I understand you correctly).
But I would like to go further and say that I think you are making the code much much more complicated that how it should be. My understanding is that you just want to compute the time duration between "tPS"
(the in time) and "tLL"
(the out time). Since you are already iterating over each row, you just need to assign the values appropriately
pid = row['pId']
enter_t_str = row['tPS'] # strings
exit_t_str = row['tLL'] # strings
then convert the datetime strings to datetime objects using strptime
enter_t_dt = datetime.strptime(enter_t_str, "%Y-%m-%d %H:%M:%S")
exit_t_dt = datetime.strptime(exit_t_str, "%Y-%m-%d %H:%M:%S")
then calculate the duration
duration = exit_t_dt - enter_t_dt
then finally append it to your list
enter_exit_times.append([pid, enter_t_str, exit_t_str, duration.total_seconds()])
There is no need to keep track of the "pId"
.
Here's the full code:
stats = df.sort_values(by=['pId', 'tPS', 'tLL'])[['pId', 'tPS', 'tLL', 'dZ']]
pid = ''
enter_t = ''
exit_t = ''
enter_exit_times = []
for ind, row in stats.iterrows():
pid = row['pId']
enter_t_str = row['tPS']
exit_t_str = row['tLL']
enter_t_dt = datetime.strptime(enter_t_str, "%Y-%m-%d %H:%M:%S")
exit_t_dt = datetime.strptime(exit_t_str, "%Y-%m-%d %H:%M:%S")
duration = exit_t_dt - enter_t_dt
enter_exit_times.append([pid, enter_t_str, exit_t_str, duration.total_seconds()])
enter_exit_times_df = pd.DataFrame(enter_exit_times)
print(enter_exit_times_df)
And the output DataFrame:
0 1 2 3
0 129 2019-12-02 15:04:09 2019-12-02 15:06:31 142.0
1 129 2019-12-02 15:04:15 2019-12-02 15:06:37 142.0
2 129 2019-12-02 15:05:15 2019-12-02 15:07:37 142.0
3 129 2019-12-02 15:05:18 2019-12-02 15:07:40 142.0
4 129 2019-12-02 15:05:24 2019-12-02 15:07:46 142.0
If you want to only get the enter/exit times for a particular time period of a day, you could create the datetime
objects for the start and end times, and do regular comparison:
>>> dt_beg = datetime(2019,12,2,8,0,0) #8AM
>>> dt_beg
datetime.datetime(2019, 12, 2, 8, 0)
>>> dt_end = datetime(2019,12,2,10,0,0) #10AM
>>> dt_end
datetime.datetime(2019, 12, 2, 10, 0)
>>> dt = datetime(2019,12,2,9,34,0) #9:34AM
>>> dt_beg < dt < dt_end
True
>>> dt = datetime(2019,12,2,14,34,0) #2:34PM
>>> dt_beg < dt < dt_end
False
So you could add a filter for what to append to enter_exit_times
:
if (enter_t_dt > start_dt and exit_t_dt < end_dt):
enter_exit_times.append(...)