Search code examples
rnon-linear-regression

R - Regression Analysis for Logarthmic


I perform regression analysis and try to find the best fit model for the dataset diamonds.csv in ggplot2. I use price(response variable) vs carat and I perform linear regression, quadratic, and cubic regression. The line is not the best fit. I realize the logarithmic from excel has the best fitting line. However, I couldn't figure out how to code in R to find the logarithmic fitting line. Anyone can help?

Comparing Price vs Carat

model<-lm(price~carat, data = diamonds)

Model 2 uses the polynomial to compare

model2<-lm(price~carat + I(carat^2), data = diamonds)

use cubic in model3

model3 <- lm(price~carat + I(carat^2) + I(carat^3), data = diamonds)

How can I code the log in R to get same result as excel?

y = 0.4299ln(x) - 2.5495 R² = 0.8468

Thanks!


Solution

  • The result you report from excel y = 0.4299ln(x) - 2.5495 does not contain any polynomial or cubic terms. What are you trying to do? price is very skewed and as with say 'income' it is common practice to take the log from that. This also provides the R2 you are referring to, but very different coefficients for the intercept and carat parameter.

    m1 <- lm(log(price) ~ carat, data = diamonds)
    summary(m1)
    Call:
    lm(formula = log(price) ~ carat, data = diamonds)
    
    Residuals:
        Min      1Q  Median      3Q     Max 
    -6.2844 -0.2449  0.0335  0.2578  1.5642 
    
    Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
    (Intercept) 6.215021   0.003348    1856   <2e-16 ***
    carat       1.969757   0.003608     546   <2e-16 ***
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
    Residual standard error: 0.3972 on 53938 degrees of freedom
    Multiple R-squared:  0.8468,    Adjusted R-squared:  0.8468 
    F-statistic: 2.981e+05 on 1 and 53938 DF,  p-value: < 2.2e-16