I want to build a custom function that is supported by broadcasting.
In particular, I have two arrays, one of dates and another of times, and I want to merge the two, as in datetime.datetime.combine
.
I would like to have something like this (that's the values I have, but the problem is more general):
x = array([datetime.date(2019, 1, 21), datetime.date(2019, 1, 21),
datetime.date(2019, 1, 21)])
y = array([datetime.time(0, 0), datetime.time(0, 15), datetime.time(0, 30)]
And I would like to do something like this:
datetime.combine(out[:,0], out[:,1])
To get the same result of:
np.asarray([datetime.combine(i,j) for i,j in zip(x,y)])
More generally:
Suppose I have a function f(a,b)
, and I have two numpy arrays x,y
. Is there a way to apply broadcasting rules and obtain f(x,y)
?
A custom ufuncs
is fine if you want to dig into c
code. But your illustrative case works with datetime
objects. np.frompyfunc
can be quite useful for that. With object dtype arrays, numpy
has to iterate at a (near) Python level, running Python code on each of the objects. If you call a ufunc
on an object array, it delegates the task to a corresponding method of each object (and fails it such a method does not exist).
Lets construct your date arrays:
In [20]: from datetime import datetime
In [35]: alist = [datetime(2019,1,21,0,0), datetime(2019,1,21,0,10),datetime(2020,1,21,0,0)]
In [36]: x = np.array([a.date() for a in alist])
In [37]: y = np.array([a.time() for a in alist])
In [38]: x
Out[38]:
array([datetime.date(2019, 1, 21), datetime.date(2019, 1, 21),
datetime.date(2020, 1, 21)], dtype=object)
In [39]: y
Out[39]:
array([datetime.time(0, 0), datetime.time(0, 10), datetime.time(0, 0)],
dtype=object)
And do the combine with a list comprehension:
In [41]: np.array([datetime.combine(i,j) for i, j in zip(x,y)])
Out[41]:
array([datetime.datetime(2019, 1, 21, 0, 0),
datetime.datetime(2019, 1, 21, 0, 10),
datetime.datetime(2020, 1, 21, 0, 0)], dtype=object)
and with frompyfunc
:
In [43]: np.frompyfunc(datetime.combine, 2,1)(x,y)
Out[43]:
array([datetime.datetime(2019, 1, 21, 0, 0),
datetime.datetime(2019, 1, 21, 0, 10),
datetime.datetime(2020, 1, 21, 0, 0)], dtype=object)
With frompyfunc
we can apply broadcasting
In [44]: np.frompyfunc(datetime.combine, 2,1)(x,y[:,None])
Out[44]:
array([[datetime.datetime(2019, 1, 21, 0, 0),
datetime.datetime(2019, 1, 21, 0, 0),
datetime.datetime(2020, 1, 21, 0, 0)],
[datetime.datetime(2019, 1, 21, 0, 10),
datetime.datetime(2019, 1, 21, 0, 10),
datetime.datetime(2020, 1, 21, 0, 10)],
[datetime.datetime(2019, 1, 21, 0, 0),
datetime.datetime(2019, 1, 21, 0, 0),
datetime.datetime(2020, 1, 21, 0, 0)]], dtype=object)
x
could have been constructed with frompyfunc
:
In [46]: np.frompyfunc(lambda a: a.date(),1,1)(alist)
Out[46]:
array([datetime.date(2019, 1, 21), datetime.date(2019, 1, 21),
datetime.date(2020, 1, 21)], dtype=object)
The frompyfunc
version of combine is a bit faster
In [47]: timeit np.frompyfunc(datetime.combine, 2,1)(x,y)
5.39 µs ± 181 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [48]: timeit np.array([datetime.combine(i,j) for i, j in zip(x,y)])
11.8 µs ± 66.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
though a good chunk of the [48] time comes from the array interface:
In [51]: timeit [datetime.combine(i,j) for i, j in zip(x,y)]
3.91 µs ± 41.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
combine
from list versions of x
and y
is even faster.
In [52]: %%timeit xy=zip(x.tolist(),y.tolist())
...: [datetime.combine(i,j) for i,j in xy]
190 ns ± 0.579 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)