python performance python-multiprocessing

Can I use multiprocessing from an external library to speed up a function matching two lists?

I have a function that is taking up ~95% of my execution time, I use it multiple times and it's making my program incredibly slow. I cant use the built in multiprocessing library because of my reliance on Pycharm: https://youtrack.jetbrains.com/issue/PY-53235.

Here's my core problem:

import numpy as np
import time
import ray # confirmed this library works in my env 

def get_matched_time(match: np.ndarray, template: np.ndarray) -> list:
    """
    Return an array containing the closest value 
    in the template array to the input array.
    Args: 
        match : for each value of match, 
                find the closest value in template.
        template : values to match to

    Returns: 
        matched_time : iterable of size match.size containing 
                       values found  in template closest to each value in match.
    """

    matched_time, matched_index = [], []

    for t in match: # for each and every match value
        temp = [abs(t - valor) for valor in template] # iterate template value for closest time
        matched_index.append(temp.index(min(temp))) #add the index
    return [(template[idx]) for idx in matched_index]

(...)

The match array (or any iterable) can vary between 10 and 12,000 values. The template array is usually upwards of 10,000 values as well, but always larger than match.

if __name__ == "__main__":
    start = time.time()
    returned_time = get_matched_time
    end = time.time()
    print(f"Execution time: {end - start} s")

>>>Execution time: 12.573657751083374 s

Just in-case its unclear:

match = [1.1, 2.1, 5.1] # will always be smaller than template
template = [1.0, 2.0, 3.0, 4.0, 5.0]

My desired output is [1.0, 2.0, 5.0] in any iterable form, order doesn't really matter because I can just sort by value afterwords.

I'm hoping to be able to assign each loop of for t in match to a process. I have been trying this with ray Ray Parallel Iterators but I don't understand the documentation to implement. If anybody can recommend a more efficient method, or a way to incorporate multiprocessing via ray or another library, I would much appreciate it.

Solution

You can significantly improve the performance of your search by using vectorized numpy calls in place of your iterations. In particular, we can replace this code:

for t in match: # for each and every match value
        temp = [abs(t - valor) for valor in template] 
        matched_index.append(temp.index(min(temp)))

With abundantly faster numpy operations. Rather than searching, we use vectorized maths to find all the absolute differences between match and template. Then we find the minima of these absolute values and the argument value of the minima. Finally, return the matches:

def get_matched_time(match: np.ndarray, template: np.ndarray) -> np.array:
    match = match.reshape(-1, 1)
    a = np.abs(match - template)
    mins = np.argmin(a, axis=1)
    return np.array([template[mins[i]] for i in range(len(match))])

In this code, I return an array since the inputs are arrays and this make the code interoperable with numpy functions that might be used on the ouputs.