Search code examples
rmodeling

What exactly does model_matrix() do?


I'm learning about modelling in R, and I am very confused, despite reading the documentation, about what modeling_matrix() does in the modelr package.

model_matrix(data, formula)

data = your x and y variables in a dataframe/tibble
formula = the relationship between x and y so y~x or y~ I(x^2) + x

So what does modeling_matrix() do exactly? Does it automatically find values for you? it doesnt seem so because it automatically inputs a column for y intercept so it's not like it calculates the y intersept for you.

IT seems it takes the x values and creates a matrix with them. But in the y ~I(x^2) + x example, I do not understand where it gets the x values. So the original relationship/equation is modified by using I() to add a square of every X value. But then why is the first value of I(x^2) 1, then 4, then 9? I do not understand where they get these numbers.


  ~y, ~x,
   1,  1,
   2,  2, 
   3,  3
)
model_matrix(df, y ~ x^2 + x)
#> # A tibble: 3 x 2
#>   `(Intercept)`     x
#>           <dbl> <dbl>
#> 1             1     1
#> 2             1     2
#> 3             1     3
model_matrix(df, y ~ I(x^2) + x)
#> # A tibble: 3 x 3
#>   `(Intercept)` `I(x^2)`     x
#>           <dbl>    <dbl> <dbl>
#> 1             1        1     1
#> 2             1        4     2
#> 3             1        9     3

Similarly, I am confused by what spline() or poly() does in relation to model_matrix() but maybe this question would be answered by finding out what model_matrix does

library(splines)
model_matrix(df, y ~ ns(x, 2))
#> # A tibble: 3 x 3
#>   `(Intercept)` `ns(x, 2)1` `ns(x, 2)2`
#>           <dbl>       <dbl>       <dbl>
#> 1             1       0           0    
#> 2             1       0.566      -0.211
#> 3             1       0.344       0.771

Solution

  • I can explain you the basic model.matrix function.

    df <- data.frame(x = rnorm(100),
                     y = rnorm(100),
                     z = factor(c(rep(1, 50), rep(2, 50))))
    
    head(model.matrix(y ~ x + z + x:z, data = df))
    

    model.matrix create a desgin matrix based on a formula. A design matrix is the expression of your model with the right formulation of your variables. For instance if you have square written like I(^2) it computes the square of your variable.

    M <- model.matrix(y ~ I(x^2) + z + x:z, data = df)
    sum(M[,2] == df$x^2) == nrow(df)
    TRUE
    

    See their for explanation on design matrix, write ?model.matrix for explanation on model.matrix function.

    Good luck.