I am currently reviewing this question on SO
and see that the OP
stated that by adding more for loops you can expand the polynomials. How exactly would you do so? I am trying to expand to polyorder 5.
Polynomial feature expansion in R
Here is the code below:
polyexp = function(df){
df.polyexp = df
colnames = colnames(df)
for (i in 1:ncol(df)){
for (j in i:ncol(df)){
colnames = c(colnames, paste0(names(df)[i],'.',names(df)[j]))
df.polyexp = cbind(df.polyexp, df[,i]*df[,j])
}
}
names(df.polyexp) = colnames
return(df.polyexp)
}
Ultimately, I'd like to order the matrix so that it expands in order of degree. I tried using the poly function but I'm not sure if you can order the result so that it returns a matrix that starts with degree 1 then moves to degree 2, then 3, 4, and 5.
To "sort by degree" is a little ambiguous. x^2
and x*y
both have degree 2. I'll assume you want to sort by total degree, and then within each of those, by degree of the 1st column; within that, by degree of the second column, etc. (I believe the default is to ignore total degree and sort by degree of the last column, within that the second last, and so on, but this is not documented so I won't count on it.)
Here's how to use polym
to do this. The columns are named things like "2.0"
or "1.1"
. You could sort these alphabetically and it would be fine up to degree 9, but if you convert those names using as.numeric_version
, there's no limit. So convert the column names to version names, get the sort order, and use that plus degree to re-order the columns of the result. For example,
df <- data.frame(x = 1:6, y = 0:5, z = -(1:6))
expanded <- polym(as.matrix(df), degree = 5)
o <- order(attr(expanded, "degree"),
as.numeric_version(colnames(expanded)))
sorted <- expanded[,o]
# That lost the attributes, so put them back
attr(sorted, "degree") <- attr(expanded, "degree")[o]
attr(sorted, "coefs") <- attr(expanded, "coefs")
class(sorted) <- class(expanded)
# If you call predict(), it comes out in the default order,
# so will need sorting too:
predict(sorted, newdata = as.matrix(df[1,]))[, o]
#> 0.0.1 0.1.0 1.0.0 0.0.2 0.1.1 0.2.0
#> 0.59761430 -0.59761430 -0.59761430 0.54554473 -0.35714286 0.54554473
#> 1.0.1 1.1.0 2.0.0 0.0.3 0.1.2 0.2.1
#> -0.35714286 0.35714286 0.54554473 0.37267800 -0.32602533 0.32602533
#> 0.3.0 1.0.2 1.1.1 1.2.0 2.0.1 2.1.0
#> -0.37267800 -0.32602533 0.21343368 -0.32602533 0.32602533 -0.32602533
#> 3.0.0 0.0.4 0.1.3 0.2.2 0.3.1 0.4.0
#> -0.37267800 0.18898224 -0.22271770 0.29761905 -0.22271770 0.18898224
#> 1.0.3 1.1.2 1.2.1 1.3.0 2.0.2 2.1.1
#> -0.22271770 0.19483740 -0.19483740 0.22271770 0.29761905 -0.19483740
#> 2.2.0 3.0.1 3.1.0 4.0.0 0.0.5 0.1.4
#> 0.29761905 -0.22271770 0.22271770 0.18898224 0.06299408 -0.11293849
#> 0.2.3 0.3.2 0.4.1 0.5.0 1.0.4 1.1.3
#> 0.20331252 -0.20331252 0.11293849 -0.06299408 -0.11293849 0.13309928
#> 1.2.2 1.3.1 1.4.0 2.0.3 2.1.2 2.2.1
#> -0.17786140 0.13309928 -0.11293849 0.20331252 -0.17786140 0.17786140
#> 2.3.0 3.0.2 3.1.1 3.2.0 4.0.1 4.1.0
#> -0.20331252 -0.20331252 0.13309928 -0.20331252 0.11293849 -0.11293849
#> 5.0.0
#> -0.06299408
Created on 2020-03-21 by the reprex package (v0.3.0)