I have a performance issue when coding with python. let's say I have 2 very large arrays (Nx2) of strings say with N = 12,000,000, and two variables label_a and label_b which are also strings. Here is the following code:
import numpy as np
import time
indices = np.array([np.random.choice(np.arange(5000).astype(str),size=10000000),np.random.choice(np.arange(5000).astype(str),size=10000000)]).T
costs = np.random.uniform(size=10000000)
label_a = '2'
label_b = '9'
t0 = time.time()
costs = costs[(indices[:,0]!=label_a)*(indices[:,0]!=label_b)*(indices[:,1]!=label_a)*(indices[:,1]!=label_b)]
indices = indices[(indices[:,0]!=label_a)*(indices[:,0]!=label_b)*(indices[:,1]!=label_a)*(indices[:,1]!=label_b)]
t1 = time.time()
toseq = t1-t0
print(toseq)
the above code segment takes 3 seconds every time it's ran. I would like to achieve the same thing while reducing the computing cost: I am using a boolean mask to only retrieve rows in the costs and indices arrays where the values are not label_a and label_b
As indicated in the comments, computing the values of the indices you're after only once, and combining them only once would save time.
(I've also changed the way of timing, just for brevity - the results are the same)
import numpy as np
from timeit import timeit
r = 5000
n = 10000000
indices = np.array([
np.random.choice(np.arange(r).astype(str), size=n),
np.random.choice(np.arange(r).astype(str), size=n)
]).T
costs = np.random.uniform(size=n)
label_a = '2'
label_b = '9'
n_indices = np.array([
np.random.choice(np.arange(r), size=n),
np.random.choice(np.arange(r), size=n)
]).T
def run():
global indices
global costs
_ = costs[(indices[:, 0] != label_a)*(indices[:, 0] != label_b) *
(indices[:, 1] != label_a)*(indices[:, 1] != label_b)]
_ = indices[(indices[:, 0] != label_a)*(indices[:, 0] != label_b) *
(indices[:, 1] != label_a)*(indices[:, 1] != label_b)]
def run_faster():
global indices
global costs
# only compute these only once
not_a0 = indices[:, 0] != label_a
not_b0 = indices[:, 0] != label_b
not_a1 = indices[:, 1] != label_a
not_b1 = indices[:, 1] != label_b
_ = costs[not_a0 * not_b0 * not_a1 * not_b1]
_ = indices[not_a0 * not_b0 * not_a1 * not_b1]
def run_even_faster():
global indices
global costs
# also combine them only once
cond = ((indices[:, 0] != label_a) * (indices[:, 0] != label_b) *
(indices[:, 1] != label_a) * (indices[:, 1] != label_b))
_ = costs[cond]
_ = indices[cond]
def run_sep_mask():
global indices
global costs
global cond
# just the masking part of run_even_faster
cond = ((indices[:, 0] != label_a) * (indices[:, 0] != label_b) *
(indices[:, 1] != label_a) * (indices[:, 1] != label_b))
def run_sep_index():
global indices
global costs
global cond
# just the indexing part of run_even_faster
_ = costs[cond]
_ = indices[cond]
def run_even_faster_numerical():
global indices
global costs
# use int values and n_indices instead of indices
a = int(label_a)
b = int(label_b)
cond = ((n_indices[:, 0] != a) * (n_indices[:, 0] != b) *
(n_indices[:, 1] != a) * (n_indices[:, 1] != b))
_ = costs[cond]
_ = indices[cond]
def run_all(funcs):
for f in funcs:
print('{:.4f} : {}()'.format(timeit(f, number=1), f.__name__))
run_all([run, run_faster, run_even_faster, run_sep_mask, run_sep_index, run_even_faster_numerical])
Note that I also added an example where the operation is not based on strings, but on numbers instead. If you can avoid the values being strings, but get numbers instead, you'd get a performance boost as well.
This boost gets substantial if you start comparing longer labels - in the end it might even be worth converting the strings to numbers before the filtering, if the strings get long enough.
These are my results:
0.9711 : run()
0.7065 : run_faster()
0.6983 : run_even_faster()
0.2657 : run_sep_mask()
0.4174 : run_sep_index()
0.4536 : run_even_faster_numerical()
The two sep
entries show that the indexing is about twice the amount of time it takes to build the mask for run_even_faster
, so you can only expect so much improvement from tuning it even more.
However, they also show that building the mask based on integers is less than 0.04 seconds on top of doing the actual indexing, compared to the about 0.26 seconds for building the mask based on strings. So, that's the room you have for improvement.