How can performance be improved when iterating over NumPy array?

I'm analyzing large amounts point cloud data created by a laser scanner. In the third step I remove points based on their z-value but my function is really slow.

Import The data is imported from a .csv file using pandas. The imported dataframe 'df' contains the data for X,Y,Z. Example: df has a shape [300,1001]. Then X is the first third of df. X = df.iloc[:99,1:], Y is df.iloc[100:199,1:] and so on. The first column (index) is irrelevant. One row in X,Y,Z corresponds to the data of a single scan.
Convert to NumPy The dataframe 'df' contains many empty fields ''. Therefore I change the data structure to a NumPy array 'A' of shape (N,3) in which every row represents a single point. All points containing empty values are deleted.
Remove points based on max. height of a scan. I'm only interested in the points slightly below the maximum of each scan. I use my function 'in_max_height' to create a mask of all points within the allowed range.

Here's my code:

def in_max_height(A,hMax):

    # get unique x values
    unique_x = np.unique(A[:,0])

    # create an empty mask array with the same shape as A
    mask = np.zeros_like(A[:,2], dtype=bool)

    # iterate over unique x and find the max. z-value
    for x in unique_x:
        zMax = np.max(A[A[:,0] == x, 2])
        mask[A[:,0] == x] = ~(A[A[:,0] == x, 2] < zMax - hMax)

    return mask

 A = A[in_max_height(A,hMax=1)] # apply max. layer height

Analyze Create various plots...

I tried to remove the low points after step 1 but I couldn't figure out how to ignore the index column of the dataframe.

Right now with an average point cloud consisting of about 375,000 points my function takes about 11 s to finish. I would like to learn how to fundamentally tackle these big data problems.

Solution

I admit that my code is not optimal but it's work faster than 11s on my laptop:

import random
import numpy as np
import time

def get_random_point():
    i = 1950
    return (random.randint(0, i), random.randint(0, i),  random.randint(0, i/10))

# Construct test array with 375000 points and 1950 unique values
test_array = np.array([get_random_point() for x in range(375000)],dtype=np.int64)
print(test_array.shape)
(375000, 3)

start = time.time()
# Sort on first and last column decreasing order
tsorted =  test_array[np.lexsort((test_array[:,2], test_array[:,0]))][::-1]

res = []
u = tsorted[0][0]
z_max = tsorted[0][2]
hmax = 1
for x in tsorted:
    if x[0] != u or not res:
        
        u = x[0]
        z_max = x[2]
        res.append(x)
    else:
        if x[2] + hmax >= z_max:
            res.append(x)
res = np.array(res)
print(time.time() - start)
# in secs
0.47696924209594727