Search code examples
pythonstatisticsstatsmodels

Is there a way to weigh datapoints when doing LOESS/LOWESS in Python


I would like to run a LOWESS function where different data points have different weights, but I don't see how I can pass weights to the lowess function. Here's some example code of using lowess without weights.

import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt

# Create the data
x = np.random.uniform(low=-2*np.pi, high=2*np.pi, size=500)
y = np.sin(x) + np.random.normal(size=len(x))

# Apply LOWESS (Locally Weighted Scatterplot Smoothing)
lowess = sm.nonparametric.lowess
z = lowess(y, x)
w = lowess(y, x, frac=1./3)

# Plotting
plt.figure(figsize=(12, 6))
plt.scatter(x, y, label='Data', alpha=0.5)
plt.plot(z[:, 0], z[:, 1], label='LOWESS', color='red')

My points vary in significance, so I would like to be able to create weights like weights = p.random.randint(1,5,size=500) and have the lowess process use them. I believe this is possible in R but I'm not sure if it can be done in Python. Is there a way?


Solution

  • First install the package skmisc which can perform Weighted LOESS:

    python3 -m pip install scikit-misc --user
    

    Then for a synthetic dataset:

    import numpy as np
    from skmisc.loess import loess
    import matplotlib.pyplot as plt
    
    np.random.seed(12345)
    x = np.sort(np.random.uniform(low=-2*np.pi, high=2*np.pi, size=500))
    y = np.sin(x)
    s = np.abs(0.2 * np.random.normal(size=x.size) + 0.01)
    n = s * np.random.normal(size=x.size)
    yn = y + n
    w = 1 / s ** 2
    

    We create the LOESS object and feed it with data and weights:

    regressor = loess(x, y, weights=w, span=0.3)
    regressor.fit()
    

    We regress the curve:

    prediction = regressor.predict(x)
    

    And display the result:

    fig, axe = plt.subplots()
    axe.scatter(x, yn)
    axe.plot(x, prediction.values, color="orange")
    axe.grid()
    

    enter image description here

    Notice the API of this package is a bit different from sklearn API. There are another example of usage here.