Search code examples
rmeanviolin-plot

How can I find the mean and average expression in my violin plot using R?


I have plotted my violin plots, but I am trying to find the mean and average expression at each of my time points.

Here is the code that I have used:

VlnPlot(spinalcord_neurons, features = c("Cx3cl1"), group.by= "time", pt.size = .5,) + theme (axis.text.x=element_text(angle=40, hjust=1)) + theme(legend.position = "right")

And here is the graph that I got:

enter image description here

I am new to using R and not familiar with all of the functions. Thank you for your help!


Solution

  • We can use some example data from the SeuratData package. Generally for comparing expression between classes, we use the RNA assay (or if you have used SCTransform on your data, the SCT assay)

    Seurat makes it easy to generate results, but it can be tricky to make sure results are valid. Sorry for the long-winded answer.

    I assume by "mean and average expression" you want to get the mean expression of either your counts or normalized data (which may be in the data slot).

    # Load library
    library(Seurat)
    
    # Load example data
    library(SeuratData)
    data("ifnb")
    ifnb <- UpdateSeuratObject(ifnb)
    
    # We have ident "stim" suitable for comparisons
    colnames([email protected])
    levels(factor([email protected]$stim))
    
    [1] "orig.ident"         "nCount_RNA"        
    [3] "nFeature_RNA"       "stim"              
    [5] "seurat_annotations"
    
    [1] "CTRL" "STIM"
    
    # VlnPlot using RNA assay
    VlnPlot(ifnb, features = "CXCL10", group.by = "stim", assay = "RNA")
    

    # Mean expression, returns values for each assay in your object
    AverageExpression(ifnb, group.by = "stim", features = "CXCL10")
    
    $RNA
                 CTRL          STIM
    [1,] 1.162095e+63 4.917149e+190
    

    This gave us these crazy high numbers, but from the documentation for AverageExpression():

    If slot is set to 'data', this function assumes that the data has been log normalized and therefore feature values are exponentiated prior to averaging so that averaging is done in non-log space. Otherwise, if slot is set to either 'counts' or 'scale.data', no exponentiation is performed prior to averaging

    This example dataset is not an integrated object with any sort of workflow done to normalize the data, so the data slot of the RNA assay is currently just holding the counts, same as the counts slot. We can either return average expression of the counts, or ensure that the data slot is log-normalized. If it isn't already, you will want to ensure that your dataset is properly normalized, and integrated if need be.

    These vignettes from the Seurat website are extremely helpful and practically required reading if you want to get acquainted with single cell analysis.

    # Mean expression of the counts, with sane numbers
    AverageExpression(ifnb, group.by = "stim", features = "CXCL10", slot = "counts")
    
    
    # Mean expression, but we are acting on the `data` slot after normalization and
    # log transformation. The default behavior of AverageExpression() exponentiates
    # the log-transformed expression first before calculating means
    
    # Since we have normalized the data in addition to log-transforming,
    # the results are not exactly the same
    ifnb_log <- NormalizeData(ifnb, normalization.method = "LogNormalize")
    AverageExpression(ifnb_log, group.by = "stim", features = "CXCL10")
    
    $RNA
              CTRL     STIM
    [1,] 0.2965791 30.17246
    
    
    Performing log-normalization
    0%   10   20   30   40   50   60   70   80   90   100%
    [----|----|----|----|----|----|----|----|----|----|
    **************************************************|
    $RNA
             CTRL     STIM
    [1,] 1.141298 104.3836