Search code examples
pythonnumpyperformancebuilt-in

Negative Real Solution of nthroot - Python


I am scaling data using the built-in power operation in Python. However, I noticed that when trying to nth-root negative numbers (for odd n), I receive the solution of x^n = -N corresponding to theta = pi/n which is a Complex solution. Ideally I would receive the negative Real solution x = (-N)^(1/n) instead.

Below are the following Built-in and NumPy implementations.

Built-in:

# should be -11.28443261177375
>>> (-2967000000)**(1/9)
(10.603898055039648+3.8595032592277083j)

NumPy:

# should be -11.28443261177375
>>> np.pow(-2967000000, 1/9)
nan

Is there an efficient way using NumPy or a built-in method to force Python to use the negative Real solution? I would like to avoid writing a function to satisfy this need if possible. I am currently using np.sign(x)*abs(x)**(1/n). Is there any improvement that I can make?


Solution

  • I ran some tests trying different methods to see which would be the fastest.

    import numpy as np
    import pandas as pd
    import math
    import timeit
    
    df = pd.DataFrame({"nums": np.arange(-2967000000, 100, 10000, dtype=float)})
    
    
    def method1(df=df, n=1/9):
        return df["nums"].apply(lambda x: np.sign(x)*np.abs(x)**n)
    
    
    def method2(df=df, n=1/9):
        return df["nums"].apply(lambda x: np.sign(x)*np.power(np.abs(x), n))
    
    
    def method3(df=df, n=1/9):
        return df["nums"].apply(lambda x: math.copysign(x, abs(x)**n))
    
    
    def method4(df=df, n=1/9):
        x = df["nums"]
        return pd.Series(np.sign(x)*np.abs(x)**n)
    
    
    def method5(df=df, n=1/9):
        x = df["nums"]
        return pd.Series(np.sign(x)*np.power(np.abs(x), n))
    
    
    methods = [method1, method2, method3, method4, method5]
    
    N = 7
    for method in methods:
        print(method.__name__, timeit.timeit(method, number=N)/N)
    

    Output:

    method1 0.6219473714285674
    method2 0.8631320142857086
    method3 0.13383357142856767
    method4 0.011573042857110392
    method5 0.011473671428575472
    

    So, using the numpy functions directly on the pandas Series (avoiding apply) is the fastest. This makes sense because numpy is fast when working with arrays. The next fastest is using the math library with apply, which is consistent with what I just said about numpy being fast with arrays. The slowest methods were the numpy methods used with apply, since that essentially loops through the data frame; you shouldn't use numpy when looping.