Search code examples
pythonnumpyapache-sparkcomplex-numbers

Is there a way to represent complex numbers to store in Spark DF?


I have an ndarray of values with dtype numpy.complex128. When I try to create a Spark DF using these values, I get the error:

UserWarning: createDataFrame attempted Arrow optimization because 'spark.sql.execution.arrow.enabled' is set to true; however, failed by the reason below:
Unsupported numpy type 15
Attempting non-optimization as 'spark.sql.execution.arrow.fallback.enabled' is set to true.
TypeError: not supported type: <class 'complex'>

Has anyone run into a situation like this? How could I represent these complex numbers, keeping in mind that I will ultimately need to retrieve them down the line?


Solution

  • Complex numbers are just pairs of floats. If you have a numpy array of shape (n1, n2, ..., nZ) and type complex128, you can view it as an array of shape (n1, n2, ..., 2 * nZ) and type float64:

    >>> a = np.linspace(0.+1.j, 1.+0j, 12).reshape(3, 4)
    >>> a.shape
    (3, 4)
    >>> a.dtype
    dtype('complex128')
    
    >>> b = a.view(np.float64)
    >>> b.shape
    (3, 8)
    >>> b.dtype
    np.float64
    

    The real and imaginary parts occupy every other element of the array. You can verify that the data does not change from viewing as a compatible dtype:

    >>> (b[:, ::2] == a.real).all()
    True
    >>> (b[:, 1::2] == a.imag).all()
    True
    

    The operation is very cheap: a new array object with different strides is created over the same data. When you de-serialize, you can trivially re-instate an array of shape (n1, n2, ..., 2 * nZ) and type float64 into one of shape (n1, n2, ..., nZ) and type complex128:

    >>> a2 = b.view(np.complex128)
    >>> a2.shape
    (3, 4)
    >>> a2.dtype
    dtype('complex128')