Search code examples
rviolin-plotseurat

Adding p-values to a violin plot in seurat


I currently have a seurat object scData in which I have applied module scored by using AddModuleScore.

AddModuleScore(scData, features = inflam, name = "inflam_score", seed = 1, search = FALSE, assay = NULL, nbin = 21, ctrl = 100, pool = NULL)

I've gone ahead and plotted a violin plot to see a visual by doing: VlnPlot(scData, features = c("inflam_score1"), pt.size = 0) + NoLegend() + theme(axis.text.x = element_text(angle = 90, hjust = 1))

Is there a way to add a p-value to the output?

I tried:

VlnPlot(scData, features = c("inflam_score1"), pt.size = 0) + stat_compare_means()

But get this error:

Error in stat_compare_means() : could not find function "stat_compare_means"

I tried:

VlnPlot(scData, features = c("inflam_score1"), pt.size = 0) + stat_compare_means()

But get this error:

Error in stat_compare_means() : could not find function "stat_compare_means"

UPDATE:

I needed to load the ggpubr package to get the p-values.

But since I have more than to 2 pair, how would I go about adding a global p-value plus the p-values for more than 2 groups.[![enter image description here][1]][1]

I tried:

my_comparisons <- list( c("hIgG2_mIgG2", "NIS793_mIgG2"), c("hIgG2_aIL_1b", "NIS793_aIL_1b"))
VlnPlot(fib_395, features = c("trans_score1"), pt.size = 0) + stat_compare_means(comparisons = my_comparisons) + stat_compare_means() 

But I get this warning and no p-values show up:

Warning: Removed 6 rows containing missing values (geom_signif).

Solution

  • Updated:

    Your error is very likely related to the y axis limits. The VlnPlot() function allocates the 'correct' amount of space for the plot, but you want to add p values above the violins and these don't 'fit' in the available space. If you increase the space available by changing ylim() it should work as expected:

    Minimal reproducible example:

    library(tidyverse)
    library(Seurat)
    #> Attaching SeuratObject
    library(SeuratObject)
    library(ggsignif)
    library(ggpubr)
    
    #data(package = "Seurat")
    #data(package = "SeuratObject")
    
    data(pbmc_small)
    
    scData <- pbmc_small
    
    cd_features <- list(c(
      'CD79B',
      'CD79A',
      'CD19',
      'CD180',
      'CD200',
      'CD3D',
      'CD2',
      'CD3E',
      'CD7',
      'CD8A',
      'CD14',
      'CD1C',
      'CD68',
      'CD9',
      'CD247'
    ))
    
    AddModuleScore(object = scData, features = cd_features, name = "cd_features",
                   seed = 1, search = FALSE, assay = NULL,
                   nbin = 21, ctrl = 5, pool = NULL)
    #> An object of class Seurat 
    #> 230 features across 80 samples within 1 assay 
    #> Active assay: RNA (230 features, 20 variable features)
    #>  2 dimensional reductions calculated: pca, tsne
    
    my_comparisons <- list( c("0", "1"), c("1", "2"), c("0", "2"))
    
    VlnPlot(scData, features = "PC_1", pt.size = 0) +
      NoLegend() +
      theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
      stat_compare_means()
    

    Replicating the problem:

    VlnPlot(scData, features = "PC_1", pt.size = 0) +
      NoLegend() +
      theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
      stat_compare_means(comparisons = my_comparisons)
    #> [1] FALSE
    #> Warning: Removed 9 rows containing missing values (`geom_signif()`).
    

    Changing ylim() to fix the problem:

    VlnPlot(scData, features = "PC_1", pt.size = 0) +
      NoLegend() +
      theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
      stat_compare_means(comparisons = my_comparisons) +
      ylim(-2, 15)
    #> [1] FALSE
    #> Scale for y is already present.
    #> Adding another scale for y, which will replace the existing scale.
    

    Same outcome using the ggsignif package (which ggpubr uses 'under the hood'):

    VlnPlot(scData, features = "PC_1", pt.size = 0) +
      NoLegend() +
      theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
      geom_signif(comparisons = my_comparisons,
                  map_signif_level = function(p) sprintf("*p = %.2g", p),
                  step_increase = 0.15) +
      ylim(-2, 15)
    #> Scale for y is already present.
    #> Adding another scale for y, which will replace the existing scale.
    

    Created on 2023-01-13 with reprex v2.0.2