Search code examples
pythonpandasswifter

Swifter: what is the difference vectorized and non-vectorized function?


I need to learn about pandas speed optimization. Library that very effective about my problem is swifter. But i don't understand about the documentation, especially vectorized function.

My assumption is swifter input is only accept vector input, not dataframe. is it wrong?

in the documentation this is vectorized function:

def bikes_proportion(x, max_x):
    return x * 1.0 / max_x

and this is non-vectorized function:

def convert_to_human(datetime):
    return datetime.weekday_name + ', the ' + str(datetime.day) + 'th day of ' + datetime.strftime("%B") + ', ' + str(datetime.year)

what is the difference?

Can you tell me what is the different about vectorized and non-vectorized function? and if you ever use swifter before. can swifter work with dataframe or it only work with vector?


Solution

  • I am trying my best to explain with simple use case here,

    Vectorized code refers to operations that are performed on multiple components of a vector at the same time (in one statement)

    import numpy as np
    
    a = np.array([1,2,3,4,5])
    b = np.array([1,1,1,1,1])
    c = a+b
    

    Refer to below code, operands are scalars not vectors, performed on one component of vector a and one component of vector b at a time

    a = [1,2,3,4,5]
    b = [1,1,1,1,1]
    c = []
    for a_, b_ in zip(a, b):
        c.append(a_ + b_)
    

    Swifter you can apply to data-frame, ref : https://github.com/jmcarpenter2/swifter

    df = pd.DataFrame({'x': [1, 2, 3, 4], 'y': [5, 6, 7, 8]})
    df['agg'] = df.swifter.apply(lambda x: x.sum() - x.min())