Search code examples
pythonrstatisticslinear-regressiondata-fitting

linear fit with predefined error in response variable


I have the following dataset (replication):

ordinal_var    fraction    error_on_fraction
1              1.2         0.1
2              0.87        0.23
4              1.12        0.11
5              0.75        0.06
5              0.66        0.15
6              0.98        0.08
7              1.34        0.05
7              2.86        0.12

Now I want to do linear regression analysis (preferably in R but python is also fine) were I pass the error in y for each point within the formula. So in R this would be something like (for better understanding of the question):

lm(fraction +-error_on_fraction ~ ordinal_var, data = dataset)

Of course I tried to find how to do it myself first but I can't find an answer. For previous analysis with error on x and y I just the scipy.odr library but I can't find how to do it with only an error in the y(response) variable.

Any help would be much appreciated!


Solution

  • We can use a simple weighted least squares model.

    Sample data

    Let's read in your sample data.

    df <- read.table(text =
        "ordinal_var    fraction    error_on_fraction
    1              1.2         0.1
    2              0.87        0.23
    4              1.12        0.11
    5              0.75        0.06
    5              0.66        0.15
    6              0.98        0.08
    7              1.34        0.05
    7              2.86        0.12", header = T)
    

    Weighted least squares model

    We fit a weighted linear model of the form fraction ~ ordered(ordinal_var), where the weights are given by 1 / error_on_fraction.

    fit <- lm(
        fraction ~ ordered(ordinal_var),
        weights = 1 / error_on_fraction,
        data = df)
    summary(fit)
    #    
    #Call:
    #lm(formula = fraction ~ ordered(ordinal_var), data = df, weights = 1/error_on_fraction)
    #
    #Weighted Residuals:
    #         1          2          3          4          5          6          7
    # 2.220e-16 -1.851e-16 -1.753e-17  1.050e-01 -1.660e-01  1.810e-17 -1.999e+00
    #         8
    # 3.097e+00
    #
    #Coefficients:
    #                       Estimate Std. Error t value Pr(>|t|)
    #(Intercept)              1.1136     0.3365   3.309   0.0804 .
    #ordered(ordinal_var).L   0.3430     0.7847   0.437   0.7047
    #ordered(ordinal_var).Q   0.6228     0.7057   0.883   0.4706
    #ordered(ordinal_var).C   0.2794     0.8920   0.313   0.7838
    #ordered(ordinal_var)^4   0.2127     0.9278   0.229   0.8400
    #ordered(ordinal_var)^5  -0.2469     0.7916  -0.312   0.7846
    #---
    #Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    #
    #Residual standard error: 2.61 on 2 degrees of freedom
    #Multiple R-squared:  0.5427,   Adjusted R-squared:  -0.6004
    #F-statistic: 0.4748 on 5 and 2 DF,  p-value: 0.783