I have workers dictionary with names as keys and lists of two dates as values, that form some time interval:
workers = {
'Alan': [dt.date(2017, 1, 15), dt.date(2020, 5, 4)],
'Ben': [dt.date(2018, 3, 28), dt.date(2021, 5, 7)]
}
And also I have df dataframe with some events, which one is related to some worker:
event_id person event_date
1LFDVDX Alan 2018-10-28
4DLDQVC Ben 2022-02-01
5PEXVGH Ben 2019-09-05
9OPCLXD John 2020-06-15
So I'm trying to filter df with query() using dictionary as multiple condition. Particularly in described case I want to get from df all events for Alan with event_time
between 2017-01-15 and 2020-05-04 and also all events for Ben with event_time
between 2018-03-28 and 2021-05-07. So expected result should be as below:
event_id worker event_date
1LFDVDX Alan 2018-10-28
5PEXVGH Ben 2019-09-05
I was trying next one:
df.query('person in @workers and event_date >= @workers[person][0] and event_date <= @workers[person][1]')
But got TypeError: unhashable type: 'Series'
How can I solve this? Thank you in advance.
The problem is in this expression:
person in @workers
Here person
is a Series
that is a not hashable object, which is being searched in the keys of the dictionary. To solve this particular problem you use isin
, but that only answers the question partially.
I suggest you do this instead:
from collections import defaultdict
# create a defaultdict to be used in map, is useful for given a default values to NaN
lookup = defaultdict(list, workers)
# create new Series applying the lookup dictionary
person_map = df["person"].map(lookup)
# convert the series to a DataFrame with the same index as df
ran = pd.DataFrame(data=person_map.to_list(), columns=["start", "end"], index=person_map.index)
# use a very clean query
res = df.query("person.isin(@workers) and @ran.start < event_date < @ran.end")
print(res)
Output
event_id person event_date
0 1LFDVDX Alan 2018-10-28
2 5PEXVGH Ben 2019-09-05