How to improve cython code to make it faster than numpy select function?

I am trying to make a code faster than "numpy select" but it is slower than numpy. numpy select is twice faster than my cython code. I tried big dataset and small dataset but both of cases numpy select is faster (numpy select 11.4ms, cython code 24ms)

%timeit compute_np(300)
# 11.4 ms ± 1.02 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit compute_cy(300)
# 24.8 ms ± 1.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

I tried methods in the cython documents but failed to reduce speed gap. Here is my detailed cython code.

Used packages

import numpy as np
import pandas as pd
import cython
import random
import timeit
import time
%load_ext Cython

Used dataset

dur_m = np.random.randint(1, 1001, size=100000)
pol_year = np.random.randint(1, 1001, size=100000)
calc_flag = 1
type = np.random.choice(['IF','NB', 'NB2', 'NB3'], size = 100000)

rand = np.arange(0.01, 0.05, 0.0001)
output1 = np.random.choice(rand, size=100000)
output2 = np.random.choice(rand, size=100000)
output3 = np.random.choice(rand, size=100000)

Numpy test

def compute_np(t):
    
    condition = [
    (t > dur_m) & (t < pol_year) & (calc_flag ==1),
    (t < dur_m) & (calc_flag ==1),
    (t < pol_year)
    ]
        
    result = [
    output1,
    output2,
    output3
    ]

    default = np.array([0] * 100000)

    return np.select(condition, result, default)

Cython code

%%cython --annotate
import cython
cimport cython
import numpy as np
cimport numpy as np
@cython.boundscheck(False)
@cython.wraparound(False)
def select_cy2(np.ndarray[np.uint8_t, ndim = 2, cast=True] conditions, double [:, ::1] choice, double [:] default_value):
    cdef int num_condition = conditions.shape[0]
    cdef int length = conditions.shape[1]
    cdef np.ndarray[np.float64_t, ndim=1] result = np.zeros(length, dtype=np.float64)
    cdef int i, j
    for j in range(length):
        for i in range(num_condition):
            if conditions[i,j]:
                result[j] = choice[i,j]
                break
            else:
                result[j] = default_value[i]
    return result

Cython test

def compute_cy(t):
    
    condition = [
    (t > dur_m) & (t < pol_year) & (np.array([calc_flag]*100000) ==1), 
    (t < dur_m) & (np.array([calc_flag]*100000) ==1), 
    (t < pol_year)]
        
    result = [
    output1, 
    output2, 
    output3]

    default = np.array([0.0] * 100000)

    return select_cy(np.array(condition), np.array(result), default)

Is there anyone who can suggest the method to improve the speed?

Solution

I haven't timed this so it could be wrong, but you're repeatedly copying default_value. Maybe:

    for j in range(length):
        result[j] = default_value[num_condition-1]
        for i in range(num_condition):
            if conditions[i,j]:
                result[j] = choice[i,j]
                break

    for j in range(length):
        for i in range(num_condition):
            if conditions[i,j]:
                result[j] = choice[i,j]
                break
        else:
            result[j] = default_value[num_condition-1]

You can also allocate the result using np.empty rather than np.zeros since you overwrite every element of it.

Typically the Numpy internals are pretty well-written (in either C, Fortran or Cython) so there aren't big wins to be had by rewriting individual Numpy functions. If you can eliminate a series of Numpy functions (so avoid allocating a number of intermediate arrays) then it starts to be worthwhile.