Faster exponentiation of complex arrays in Python using Arrayfire

According to the arrayfire pow documentation, af.pow() currently only supports powers (and roots...) of real arrays. No error is thrown, but I found that using af.pow() with complex input can cause a huge memory leak, especially if other functions are used as input (for example, af.pow(af.ifft(array), 2)).

To get around this, I have written the function complexPow below. This seems to run for complex arrays without the memory leak, and a quick comparison showed that my complexPow function returns the same values as numpy.sqrt() and the ** operator, for example.

def complexPow(inData, power):
    for i in af.ParallelRange(inData.shape[0]):
        theta = af.atan(af.imag(inData[i])/af.real(inData[i]))
        rSquared = af.pow(af.real(inData[i]), 2.0) + \
                    af.pow(af.imag(inData[i]), 2.0)
        r = af.pow(rSquared, .5)
        inData[i] = af.pow(r, power) * (af.cos(theta*power) + \
                1j*af.sin(theta*power))
    return inData

Is there a faster way of doing parallel element-wise exponentiation than this? I haven't found one, but scared I'm missing a trick here...

Solution

This is a little faster without the parallel for loop:

def complexPow(inData, power):
    theta = af.atan(af.imag(inData)/af.real(inData))
    r = af.pow(af.pow(af.real(inData), 2.0) + 
                af.pow(af.imag(inData), 2.0), .5)
    inData = af.pow(r, power) * (af.cos(theta*power) + \
                1j*af.sin(theta*power))
    return inData

Tetsted for 4000 iterations over a dtype=complex array with dimensions (1, 2**18) using nvidia Quadro K4200, Spyder 3, Python 2.7, Windows 7:

Using af.ParallelRange: 7.64 sec (1.91 msec per iteration).

Method above: 5.94 sec (1.49 msec per iteration).

Speed increase: 28%.