I create a mask to use in a pandas dataframe:
mask = np.logical_and(
csv_df['time'].map(operator.attrgetter('hour')).isin(
hours_set),
csv_df['time'].map(lambda x: x.weekday_name[:3]).isin(
days_set))
csv_df = csv_df.loc[mask, :]
Turns out the calculation of the two isin
Series is rather slow. The way above it calculates both Series and then adds them - is there an (idiomatic) way to short circuit per element, as the first series is mostly false so we won't need to calclulate the other series' element?
One idea is:
mask = csv_df['time'].dt.hour.isin(hours_set) &
csv_df['time'].dt.strftime('%a').isin(days_set)
Anoather idea if most values not match is filter first one and then second:
csv_df1 = csv_df.loc[csv_df['time'].dt.strftime('%a').isin(days_set)]
csv_df2 = csv_df1.loc[csv_df1['time'].dt.hour.isin(hours_set)]