Search code examples
rfunctionpdfsizepca

R: dynamically adjust output pdf size to plot area inside a function


I have a function like the one in the MWE below to generate a PCA biplot, with an aspect ratio of 1:1 to not bias its interpretation; that means sometimes I get narrower or wider plots depending on the data.

I would like to be able to somehow detect the plot area and make a pdf of the proper width and height to fit the plot well, because otherwise I end with unwanted extra space in the output file.

Check the MWE below:

pcaplot <- function(pobj, df, groupvar, filename){
    library(ggbiplot)
    P <- ggbiplot(pobj,
         obs.scale = 1, 
         var.scale=1,
         ellipse=T,
         circle=F,
         varname.size=3,
         var.axes=T,
         groups=df[,groupvar],
         alpha=0)
    P$layers <- c(geom_point(aes(color=df[,groupvar]), cex=5), P$layers)
    pdf(file=paste(filename,".pdf",sep=""), height=14, width=14) #USE PROPER WIDTH AND HEIGHT DEPENDING ON PLOT AREA
    print(
        P
    )
    dev.off()
}

data(iris)
pca.obj <- prcomp(iris[,1:4], center=TRUE, scale.=TRUE)
pcaplot(pca.obj, iris, "Species", "test")

Thanks!


Solution

  • I have a solution for you for better scaling.

    As a base you you should use your data object - is not a simple data.frame these is more inside. For example your pca.obj contain also list of x pca.obj$x -> there are points to plot

    pca.obj$x[,1] will be for PC1
    pca.obj$x[,3] will be for PC3

    now you can use them to calculate a range for points

    > pca.obj$x[,1] %>% range() %>% diff()
    [1] 6.064723
    
    > pca.obj$x[,3] %>% range() %>% diff()
    [1] 1.856603
    

    that values you can use as base to your scaling. (in my case I put also *3 to size of pdf to have a better resolution for fonts etc. with same ratio) In my example I will give you from your data Iris PC1 vs PC3

    library(magrittr) # for pipe
    pcaplot <- function(pobj, df, pca_choices, groupvar, filename){
    
    width_scale <-  pobj$x[,pca_choices[1]] %>% range() %>% diff() %>% ceiling() * 3
    height_scale <- pobj$x[,pca_choices[2]] %>% range() %>% diff() %>% ceiling() * 3
    
    library(ggbiplot)
    P <- ggbiplot(pobj,
                  choices = pca_choices,
                  obs.scale = 1, 
                  var.scale=1,
                  ellipse=T,
                  circle=F,
                  varname.size=3,
                  var.axes=T,
                  groups=df[,groupvar],
                  alpha=0)
        P$layers <- c(geom_point(aes(color=df[,groupvar]), cex=5), P$layers)
        pdf(file=paste(filename,".pdf",sep=""), height=height_scale, width=width_scale) #USE PROPER WIDTH AND HEIGHT DEPENDING ON PLOT AREA
        print(P)
        dev.off()
    }
    
    data(iris)
    pca.obj <- prcomp(iris[,1:4], center=TRUE, scale.=TRUE)
    pca_choices <- c(1, 3)
    pcaplot(pca.obj, iris, pca_choices, "Species", "test")
    

    black border just to show real space around.
    base version: enter image description here

    new : enter image description here