Search code examples

efficient way to extract non None arrays from numpy ndarray

My sincere apologies, in advance if this question seems quite long and basic.


import numpy as np
import time

c, q = int(3e5), int(5e5)    
a = np.full( (c,q,3), None )

# fillout with some non None arrays: 3D (x,y,z) positions
a[0,0, :] = np.array([-4,0.1,0])
a[0,1, :] = np.array([9.2,3.1,0])
a[0,5, :] = np.array([3,-4.3,0])
a[0,6, :] = np.array([-1,12.8,0])

a[2,1, :] = np.array([4.5,-9,0])
a[2,3, :] = np.array([-0.1,6.1,0])
a[2,8, :] = np.array([-7,1,0])

a[3,0, :] = np.array([-1,0.7,0])
a[3,6, :] = np.array([-15,26,0])

a[5,0, :] = np.array([0.1,-1.1,0])

a[7,5, :] = np.array([0,0,0])

a[8,2, :] = np.array([5,6,0])

a[9,10, :] = np.array([-1.1,1,0])

a[10,3, :] = np.array([-32,15,0])

a[11,7, :] = np.array([0,9.3,0])

a[12,2, :] = np.array([0.9,6.2,0])

a[14,9, :] = np.array([8.6,5.6,0])

a[15,5, :] = np.array([0.5,8.5,0])


I'd like to extract the non None elements from a. Currently, my following code is super time consuming and quite inefficient since I am using rudimentary for loop:

bt = time.time()
for ci in range(c):
    if any(ci == value for value in [2, 5]):
        print(f">> Generating {ci}+ ranks ...")
        poseNplus = []
        aNplus = a[ci:]
        for ci_i in range(aNplus.shape[0]):
            aNplus_Q = aNplus[ci_i]
            for qi in range(aNplus_Q.shape[0]):
                if all(aNplus_Q[qi] != None):
                    poseNplus.append( aNplus_Q[qi] )
        print(len(poseNplus), poseNplus)
et = time.time()
print(f"Took {(et-bt):.3f} s")

which is quite time taking:

Took 580.888 s

Following @Marc Felix answer, I could extract ALL non None triplets as follows: first change a = np.full( (c,q,3), np.nan ), then:

bt = time.time()
nan_values = np.any(np.isnan(a), axis=-1)
result = a[nan_values==False].reshape((-1, 3))
et = time.time()
print(f"Took {(et-bt):.3f} s")

which returns:

Took 0.318 s
(18, 3)
[[ -4.    0.1   0. ]
 [  9.2   3.1   0. ]
 [  3.   -4.3   0. ]
 [ -1.   12.8   0. ]
 [  4.5  -9.    0. ] <<<--- rank2 - END: from here till end
 [ -0.1   6.1   0. ]
 [ -7.    1.    0. ]
 [ -1.    0.7   0. ]
 [-15.   26.    0. ]
 [  0.1  -1.1   0. ] <<<--- rank5 - END: from here till end
 [  0.    0.    0. ]
 [  5.    6.    0. ]
 [ -1.1   1.    0. ]
 [-32.   15.    0. ]
 [  0.    9.3   0. ]
 [  0.9   6.2   0. ]
 [  8.6   5.6   0. ]
 [  0.5   8.5   0. ]]

But my desired results should be like this:

>> Generating 2+ ranks ...
[[  4.5  -9.    0. ]
 [ -0.1   6.1   0. ]
 [ -7.    1.    0. ]
 [ -1.    0.7   0. ]
 [-15.   26.    0. ]
 [  0.1  -1.1   0. ]
 [  0.    0.    0. ]
 [  5.    6.    0. ]
 [ -1.1   1.    0. ]
 [-32.   15.    0. ]
 [  0.    9.3   0. ]
 [  0.9   6.2   0. ]
 [  8.6   5.6   0. ]
 [  0.5   8.5   0. ]]
>> Generating 5+ ranks ...
[[  0.1  -1.1   0. ]
 [  0.    0.    0. ]
 [  5.    6.    0. ]
 [ -1.1   1.    0. ]
 [-32.   15.    0. ]
 [  0.    9.3   0. ]
 [  0.9   6.2   0. ]
 [  8.6   5.6   0. ]
 [  0.5   8.5   0. ]]


Is there any other time efficient way to do this?

I am aware of this post but it results in:

b = a[a != None]

[-4.0 0.1 0.0 9.2 3.1 0.0 3.0 -4.3 0.0 -1.0 12.8 0.0 4.5 -9.0 0.0 -0.1 6.1
 0.0 -7 1 0 -1.0 0.7 0.0 -15 26 0 0.1 -1.1 0.0 0 0 0 5 6 0 -1.1 1.0 0.0
 -32 15 0 0.0 9.3 0.0 0.9 6.2 0.0 8.6 5.6 0.0 0.5 8.5 0.0]


  • Modifying @Marc Felix answer and a modification using np.full as the questioner's updates:

    nan_values = np.any(np.isnan(a[2:]), axis=-1)
    result = a[2:][nan_values==False].reshape((-1, 3))
    print(f">> Generating {2}+ ranks ...\n", result, '\n ------------------------------------------------------------')
    nan_values = np.any(np.isnan(a[5:]), axis=-1)
    result = a[5:][nan_values==False].reshape((-1, 3))
    print(f">> Generating {5}+ ranks ...\n", result, '\n ------------------------------------------------------------')

    will get the expected result.