How to compare a 3 sets of data in order to sort out how 2 of these data influence the third one?

I have a 3 set data like this:

There is a tool to say what is the most important variable in the removal? Is pH or dosage? I was thinking in a PCA (principal component analysis) however I'm a little lost

Solution

Here are some things to try.

From the plot it seems clear that Dosage (column 2) is more closely related to Removal (column 3) than pH (column 1).

Also Dosage has a 61% correlation with Removal whereas pH has a correlation of only -14%.

Neither variable is statistically significant in the lm summary output likely because of the small amount of data.

Stepwise regression based on AIC chooses the Removal ~ Dosage model.

(continued after graph)

matplot(scale(DF), type = "o")

cor(DF)
##                 pH    Dosage    Removal
## pH       1.0000000 0.0000000 -0.1418573  <-- -14%
## Dosage   0.0000000 1.0000000  0.6091517  <-- 61%
## Removal -0.1418573 0.6091517  1.0000000

summary(lm(Removal ~., DF))

## Call:
## lm(formula = Removal ~ ., data = DF)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -15.5556  -7.0556  -4.8889   0.7778  25.7778 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   69.056     39.047   1.769    0.127  
## pH            -2.833      6.362  -0.445    0.672  <-- not significant
## Dosage        12.167      6.362   1.912    0.104  <-- not significant
## 
## Residual standard error: 15.58 on 6 degrees of freedom
## Multiple R-squared:  0.3912,    Adjusted R-squared:  0.1883 
## F-statistic: 1.928 on 2 and 6 DF,  p-value: 0.2257

fm <- step(lm(Removal ~., DF))
## ...snip...

fm
## Call:
## lm(formula = Removal ~ Dosage, data = DF)
## 
## Coefficients:
## (Intercept)       Dosage  
##       52.06        12.17

Note: The input data in reproducible form is:

DF <- structure(list(pH = c(5, 5, 5, 6, 6, 6, 7, 7, 7), Dosage = c(0L, 
1L, 2L, 0L, 1L, 2L, 0L, 1L, 2L), Removal = c(50, 60, 70, 50, 
90, 95, 50, 55, 58)), .Names = c("pH", "Dosage", "Removal"), row.names = c(NA, 
-9L), class = "data.frame")