I am scaling data using the built-in power operation in Python. However, I noticed that when trying to nth-root negative numbers (for odd n), I receive the solution of x^n = -N
corresponding to theta = pi/n
which is a Complex solution. Ideally I would receive the negative Real solution x = (-N)^(1/n)
instead.
Below are the following Built-in and NumPy implementations.
Built-in:
# should be -11.28443261177375
>>> (-2967000000)**(1/9)
(10.603898055039648+3.8595032592277083j)
NumPy:
# should be -11.28443261177375
>>> np.pow(-2967000000, 1/9)
nan
Is there an efficient way using NumPy or a built-in method to force Python to use the negative Real solution? I would like to avoid writing a function to satisfy this need if possible. I am currently using np.sign(x)*abs(x)**(1/n)
. Is there any improvement that I can make?
I ran some tests trying different methods to see which would be the fastest.
import numpy as np
import pandas as pd
import math
import timeit
df = pd.DataFrame({"nums": np.arange(-2967000000, 100, 10000, dtype=float)})
def method1(df=df, n=1/9):
return df["nums"].apply(lambda x: np.sign(x)*np.abs(x)**n)
def method2(df=df, n=1/9):
return df["nums"].apply(lambda x: np.sign(x)*np.power(np.abs(x), n))
def method3(df=df, n=1/9):
return df["nums"].apply(lambda x: math.copysign(x, abs(x)**n))
def method4(df=df, n=1/9):
x = df["nums"]
return pd.Series(np.sign(x)*np.abs(x)**n)
def method5(df=df, n=1/9):
x = df["nums"]
return pd.Series(np.sign(x)*np.power(np.abs(x), n))
methods = [method1, method2, method3, method4, method5]
N = 7
for method in methods:
print(method.__name__, timeit.timeit(method, number=N)/N)
Output:
method1 0.6219473714285674
method2 0.8631320142857086
method3 0.13383357142856767
method4 0.011573042857110392
method5 0.011473671428575472
So, using the numpy functions directly on the pandas Series (avoiding apply
) is the fastest. This makes sense because numpy is fast when working with arrays. The next fastest is using the math library with apply
, which is consistent with what I just said about numpy being fast with arrays. The slowest methods were the numpy methods used with apply
, since that essentially loops through the data frame; you shouldn't use numpy when looping.