Search code examples
rgeometryplane

Fitting a multidimensional plane in R


I have multidimensional(100+ variables) data, subset of which I'm expecting to, more or less, conform to a plane. What would be the best way to fit a plane to that subset in R?

I'd like to use the plane to calculate distance of some other points to it and to plot some dimensions of it.


Solution

  • Principal components can solve this for you. Assuming that your data really does match a plane, the first two principal components should describe that plane well.

    You do not provide any sample data, so I will illustrate with some artificial data. My data is ten dimensional, but all points lie close to a plane (with some error in the other eight directions).

    ## Sample data
    set.seed(2018)
    NPts = 1000
    x = runif(NPts)
    y = runif(NPts)
    
    cx = rnorm(1)
    cy = rnorm(1)
    V1 = cx*x + cy*y + rnorm(NPts, 0, 0.1)
    
    MyData = data.frame(V1)
    for(i in 2:10) {
        cx=rnorm(1)
        cy= rnorm(1)
        name = paste0("V", i)   
        MyData[,name] = cx*x + cy*y + rnorm(NPts, 0, 0.1)
    }
    

    Since all variables are linear combinations of x and y (plus a small error), the data is only two dimensional and lives near the x-y plane. Here I am treating x and y as latent variables. They do not appear in the data but drive the behavior of all the other variables.

    ## Principal Components Analysis
    PCA = prcomp(MyData)
    plot(PCA)
    

    PCA Plot

    Yep, the data looks basically two dimensional. All that remains is to get the first two principal components. They are stored in the structure returned from prcomp.

    PCA$rotation[,1:2]
                PC1          PC2
    V1   0.42752681 -0.204894748
    V2  -0.64546573 -0.056503044
    V3   0.04606707 -0.009614603
    V4   0.01956126 -0.539070667
    V5   0.15987617  0.600122935
    V6  -0.06255399  0.054053476
    V7   0.26497132  0.388920891
    V8   0.21645814 -0.366709584
    V9   0.49363625 -0.116954131
    V10  0.08874645  0.040656622
    

    The plane that we are looking for is the plane spanned by these two vectors.