By "similar vector" I define a vector that differs from a given one at just one position by either -1 or 1. However, if the element of the given one is zero, only difference by 1 is possible. Examples:
similar_vectors(np.array([0,0,0]))
array([[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.]])
similar_vectors(np.array([1,0,2,3,0,0,1]))
array([[ 0., 0., 2., 3., 0., 0., 1.],
[ 2., 0., 2., 3., 0., 0., 1.],
[ 1., 1., 2., 3., 0., 0., 1.],
[ 1., 0., 1., 3., 0., 0., 1.],
[ 1., 0., 3., 3., 0., 0., 1.],
[ 1., 0., 2., 2., 0., 0., 1.],
[ 1., 0., 2., 4., 0., 0., 1.],
[ 1., 0., 2., 3., 1., 0., 1.],
[ 1., 0., 2., 3., 0., 1., 1.],
[ 1., 0., 2., 3., 0., 0., 0.],
[ 1., 0., 2., 3., 0., 0., 2.]])
I would like to obtain maximally fast implementation of similar_vectors(vector)
function mentioned above. It is run in my simulation millions of times for vector
of length ~few dozens, so speed is crucial. I am interested both in pure numpy solutions as well as some wrappers to other languages. The code can be parallel.
My current implementation is the following:
def singleOne(length,positionOne): #generates a vector of len length filled with zeros apart from a single one at positionOne
arr=np.zeros(length)
arr[positionOne]=1
return arr
def similar_vectors(state):
connected=[]
for i in range(len(state)):
if(state[i]!=0):
connected.append(state-singleOne(state.shape[0],i))
connected.append(state+singleOne(state.shape[0],i))
return np.array(connected)
This is terribly slow due to the for loop that I can't easily get rid of.
For reference I'm attaching the profile from 1000 executions of similar_vectors(vector)
:
37003 function calls in 0.070 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.070 0.070 {built-in method builtins.exec}
1 0.003 0.003 0.070 0.070 <string>:2(<module>)
1000 0.035 0.000 0.064 0.000 <ipython-input-19-69431411f902>:6(similar_vectors)
11000 0.007 0.000 0.021 0.000 <ipython-input-19-69431411f902>:1(singleOne)
11000 0.014 0.000 0.014 0.000 {built-in method numpy.core.multiarray.zeros}
2000 0.009 0.000 0.009 0.000 {built-in method numpy.core.multiarray.array}
11000 0.002 0.000 0.002 0.000 {method 'append' of 'list' objects}
1000 0.000 0.000 0.000 0.000 {built-in method builtins.len}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
Approach #1
Here's a masking based one -
def similar_vectors_masking(a):
n = len(a)
m = n*2+1
ar = np.repeat(a[None],len(a)*2,0)
ar.ravel()[::m] -= 1
ar.ravel()[n::m] += 1
mask = np.ones(len(ar),dtype=bool)
mask[::2] = a!=0
out = ar[mask]
return out
Sample runs -
In [142]: similar_vectors_masking(np.array([0,0,0]))
Out[142]:
array([[1, 0, 0],
[0, 1, 0],
[0, 0, 1]])
In [143]: similar_vectors_masking(np.array([1,0,2,3,0,0,1]))
Out[143]:
array([[0, 0, 2, 3, 0, 0, 1],
[2, 0, 2, 3, 0, 0, 1],
[1, 1, 2, 3, 0, 0, 1],
[1, 0, 1, 3, 0, 0, 1],
[1, 0, 3, 3, 0, 0, 1],
[1, 0, 2, 2, 0, 0, 1],
[1, 0, 2, 4, 0, 0, 1],
[1, 0, 2, 3, 1, 0, 1],
[1, 0, 2, 3, 0, 1, 1],
[1, 0, 2, 3, 0, 0, 0],
[1, 0, 2, 3, 0, 0, 2]])
Approach #2
For zeros-major-filled array, we might be better off with replication to the exact output size using np.repeat
and using a bit of masking again to increment and decrement 1s
, like so -
def similar_vectors_repeat(a):
mask = a!=0
ra = np.arange(len(a))
r = mask+1
n = r.sum()
ar = np.repeat(a[None],n,axis=0)
add_idx = r.cumsum()-1
ar[add_idx,ra] += 1
ar[(add_idx-1)[mask],ra[mask]] -= 1
return ar
Timings with all posted approaches on a large array -
In [414]: # Setup input array with ~80% zeros
...: np.random.seed(0)
...: a = np.random.randint(1,5,(5000))
...: a[np.random.choice(range(len(a)),int(len(a)*0.8),replace=0)] = 0
In [415]: %timeit similar_vectors(a) # Original soln
...: %timeit similar_vectors_flatnonzero_concat(a) # @yatu's soln
...: %timeit similar_vectors_v2(a) # @Brenlla's soln
...: %timeit similar_vectors_masking(a)
...: %timeit similar_vectors_repeat(a)
1 loop, best of 3: 195 ms per loop
1 loop, best of 3: 234 ms per loop
1 loop, best of 3: 231 ms per loop
1 loop, best of 3: 238 ms per loop
10 loops, best of 3: 82.8 ms per loop