I have a 3 set data like this:
There is a tool to say what is the most important variable in the removal? Is pH or dosage? I was thinking in a PCA (principal component analysis) however I'm a little lost
Here are some things to try.
From the plot it seems clear that Dosage (column 2) is more closely related to Removal (column 3) than pH (column 1).
Also Dosage has a 61% correlation with Removal whereas pH has a correlation of only -14%.
Neither variable is statistically significant in the lm summary output likely because of the small amount of data.
Stepwise regression based on AIC chooses the Removal ~ Dosage model.
(continued after graph)
matplot(scale(DF), type = "o")
cor(DF)
## pH Dosage Removal
## pH 1.0000000 0.0000000 -0.1418573 <-- -14%
## Dosage 0.0000000 1.0000000 0.6091517 <-- 61%
## Removal -0.1418573 0.6091517 1.0000000
summary(lm(Removal ~., DF))
## Call:
## lm(formula = Removal ~ ., data = DF)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.5556 -7.0556 -4.8889 0.7778 25.7778
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 69.056 39.047 1.769 0.127
## pH -2.833 6.362 -0.445 0.672 <-- not significant
## Dosage 12.167 6.362 1.912 0.104 <-- not significant
##
## Residual standard error: 15.58 on 6 degrees of freedom
## Multiple R-squared: 0.3912, Adjusted R-squared: 0.1883
## F-statistic: 1.928 on 2 and 6 DF, p-value: 0.2257
fm <- step(lm(Removal ~., DF))
## ...snip...
fm
## Call:
## lm(formula = Removal ~ Dosage, data = DF)
##
## Coefficients:
## (Intercept) Dosage
## 52.06 12.17
Note: The input data in reproducible form is:
DF <- structure(list(pH = c(5, 5, 5, 6, 6, 6, 7, 7, 7), Dosage = c(0L,
1L, 2L, 0L, 1L, 2L, 0L, 1L, 2L), Removal = c(50, 60, 70, 50,
90, 95, 50, 55, 58)), .Names = c("pH", "Dosage", "Removal"), row.names = c(NA,
-9L), class = "data.frame")