Search code examples
pandasjulia

getting exponential moving average over complete data length in julia


I want to use the ewma (Exponential Weighted Moving Average) function available in the pandas library in Python with an equivalent Julia function. In Julia, there is a function called MarketTechnicals.ema(m) which calculates the EMA values. However, this function starts calculating values from the mth element in a data series of size n, so the resulting data would be of length (n-m+1). While this is understandable since the span is m, the pandas ewma function calculates this from the very first element, no matter what the span length is, as shown below:

#python
import pandas as pd
data = {"val": [1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(data)
df['val'].ewm(m).mean()     
#output m=3
0    1.000000
1    1.571429
2    2.189189
3    2.851429
4    3.555698
5    4.299079

This Python code returns n values if the length of df['val'] is n. But following Julia snippet would be of length (n-m+1)

# julia
using DataFrames, MarketTechnicals
df = DataFrame(val = [1, 2, 3, 4, 5, 6])
ema(df.val,m)              
#output m=3
4×1 Matrix{Float64}:
 2.0
 3.0
 4.0
 5.0

I came across a repo that allows using pandas functionality in Julia, but I am not clear with the syntax and couldn't find any examples online.

Can someone help me with a Julia equivalent for the pandas ewma function?


Solution

  • Using the pd.ewma explanation from the pandas source code, I created following function for julia. It caters options of span, com, halflife, adjust, ignore parameters

    # julia code to recreate pandas.ewma
    abstract type AlphaCalc end
    
    struct Com{T<:Float64}<:AlphaCalc
      com::T
    end
    
    struct Span{T<:Float64}<:AlphaCalc
      span::T
    end
    
    struct Halflife{T<:Float64}<:AlphaCalc
      halflife::T
    end
    
    calcAlpha(val::Com)      = 1/(1 + val.com)
    calcAlpha(val::Span)     = 2/(1 + val.span)
    calcAlpha(val::Halflife) = 1 - exp(log(0.5)/val.halflife)
    
    # julia code to recreate pandas.ewma
    function ewma(
      data::AbstractVector,
      alphaMethod::AlphaCalc;
      adjust::Bool=true,
      ignore_na::Bool=false
    )
    
      alpha = calcAlpha(alphaMethod)
      n_samples = length(data)
      ewma_result = Vector{Float64}(undef, n_samples)
    
      if ignore_na
        ewma_result[1] = first(data)
        weights = [1.]
    
        for i in 2:n_samples
          if isnan(data[i])
            ewma_result[i] = ewma_result[i-1]
          else
            if !isnan(data[i-1])
              weights .*= (1 - alpha)
              push!(weights, 1)
            end
            # When adjust is True, use the weighted sum divided by the sum of weights
            if adjust
              weighted_sum = sum([weight * value for (weight, value) in zip(weights, data[1:i][.!isnan.(data[1:i])])])
              ewma_result[i] = weighted_sum / sum(weights)
            # When adjust is False, use the recursive formula
            else
              ewma_result[i] = (1 - alpha) * ewma_result[i-1] + alpha * data[i]
            end
          end
        end
      else
        ewma_result[1] = first(data)
        for i in 2:n_samples
          # When adjust is True, use the weighted sum divided by the sum of weights
          if adjust
            weights = [(1 - alpha) ^ (i - j - 1) for j in 0:(i-1)]
            weighted_sum = sum([weight * value for (weight, value) in zip(weights, data[1:i])])
            ewma_result[i] = weighted_sum / sum(weights)
          # When adjust is False, use the recursive formula
          else
            ewma_result[i] = (1 - alpha) * ewma_result[i-1] + alpha * data[i]
          end
        end
      end
      return ewma_result
    end
    

    It was tested in julia as-

    data = [1., 2., 3., 4., 5., 6., 8.0]
    ewma_result = ewma(data, Com(5.), adjust=false, ignore_na=false);
    println("EWMA result: ", ewma_result)
    ewma_result = ewma(data, Span(5.), adjust=false, ignore_na=true);
    println("EWMA result: ", ewma_result)
    ewma_result = ewma(data, Halflife(5.), adjust=true, ignore_na=true);
    println("EWMA result: ", ewma_result)
    
    #outputs 
    EWMA result: [1.0, 1.1666666666666667, 1.4722222222222223, 1.8935185185185186, 2.4112654320987654, 3.0093878600823047, 3.841156550068587]
    EWMA result: [1.0, 1.3333333333333335, 1.888888888888889, 2.5925925925925926, 3.3950617283950617, 4.263374485596708, 5.508916323731139]
    EWMA result: [1.0, 1.5346019613807635, 2.0921248294619423, 2.672350105842331, 3.2749760411274242, 3.8996216516650244, 4.754261118819774]
    

    with corresponding python functionality as-

    >>> import pandas as pd
    >>> data = [1., 2., 3., 4., 5., 6., 8.0]
    >>> data = pd.Series(data)
    >>> data.ewm(com=5.0, adjust=False, ignore_na=False).mean()
    0    1.000000
    1    1.166667
    2    1.472222
    3    1.893519
    4    2.411265
    5    3.009388
    6    3.841157
    dtype: float64
    >>> data.ewm(span=5.0, adjust=False, ignore_na=True).mean()
    0    1.000000
    1    1.333333
    2    1.888889
    3    2.592593
    4    3.395062
    5    4.263374
    6    5.508916
    dtype: float64
    >>> data.ewm(halflife=5.0, adjust=True, ignore_na=True).mean()
    0    1.000000
    1    1.534602
    2    2.092125
    3    2.672350
    4    3.274976
    5    3.899622
    6    4.754261
    dtype: float64