Search code examples
rmutate

adding a new column to a datframe for 3 condition cases


I have a dataframe like this:

geneID  baseMean    log2FoldChange  lfcSE   stat    pvalue  padj
ENSG00000000003.14  2700.791337 -0.345466785    0.202389477 -1.706940451    0.087833121 0.001
ENSG00000000419.12  1571.143316 -0.348258736    0.150807514 -2.309293001    0.020927328 0.120478416
ENSG00000000457.13  526.2282051 -0.051250213    0.180482116 -0.283962835    0.776438862 0.003
ENSG00000000460.16  1108.138705 -0.078538637    0.167859597 -0.467882913    0.639868323 0.827329552
ENSG00000001036.13  2662.132047 0.121419414 0.175209898 0.692994033 0.488313296 0.728842774
ENSG00000001084.10  1325.447272 0.89    0.154875429 -0.423289781    0.672083849 0.0004
ENSG00000001167.14  1829.828657 -0.221749678    0.153100403 -1.448393819    0.147506943 0.386446872
ENSG00000001460.17  641.7582879 -0.252419377    0.183602552 -1.374814095    0.169189087 0.417816879

I want to add a column named threshold such that if

df$log2FoldChange > 0 & df$padj < 0.05 this should be labeled up
df$log2FoldChange < 0 & df$padj < 0.05 this should be labeled down
and anything else as NS

So for the above table, output should look like this:

geneID  baseMean    log2FoldChange  lfcSE   stat    pvalue  padj    threshold
ENSG00000000003.14  2700.791337 -0.345466785    0.202389477 -1.706940451    0.087833121 0.001   down
ENSG00000000419.12  1571.143316 -0.348258736    0.150807514 -2.309293001    0.020927328 0.120478416 NS
ENSG00000000457.13  526.2282051 -0.051250213    0.180482116 -0.283962835    0.776438862 0.003   down
ENSG00000000460.16  1108.138705 -0.078538637    0.167859597 -0.467882913    0.639868323 0.827329552 NS
ENSG00000001036.13  2662.132047 0.121419414 0.175209898 0.692994033 0.488313296 0.728842774 NS
ENSG00000001084.10  1325.447272 0.89    0.154875429 -0.423289781    0.672083849 0.0004  up
ENSG00000001167.14  1829.828657 -0.221749678    0.153100403 -1.448393819    0.147506943 0.386446872 NS
ENSG00000001460.17  641.7582879 -0.252419377    0.183602552 -1.374814095    0.169189087 0.417816879 NS

I tried this but of course it is not doing what I want:

dat <- mutate(dat,threshold=if_else(dat$padj <= 0.05 & dat$log2FoldChange > 0,"up","NS"))
dat <- mutate(dat,threshold=if_else(dat$padj <= 0.05 & dat$log2FoldChange < 0,"down","NS"))

Solution

  • One option is to use case_when() from the dplyr package to do both "up" and "down" (or else "NS") in one step, e.g.

    library(dplyr)
    #> 
    #> Attaching package: 'dplyr'
    #> The following objects are masked from 'package:stats':
    #> 
    #>     filter, lag
    #> The following objects are masked from 'package:base':
    #> 
    #>     intersect, setdiff, setequal, union
    
    df <- read.table(text = "geneID  baseMean    log2FoldChange  lfcSE   stat    pvalue  padj
    ENSG00000000003.14  2700.791337 -0.345466785    0.202389477 -1.706940451    0.087833121 0.001
    ENSG00000000419.12  1571.143316 -0.348258736    0.150807514 -2.309293001    0.020927328 0.120478416
    ENSG00000000457.13  526.2282051 -0.051250213    0.180482116 -0.283962835    0.776438862 0.003
    ENSG00000000460.16  1108.138705 -0.078538637    0.167859597 -0.467882913    0.639868323 0.827329552
    ENSG00000001036.13  2662.132047 0.121419414 0.175209898 0.692994033 0.488313296 0.728842774
    ENSG00000001084.10  1325.447272 0.89    0.154875429 -0.423289781    0.672083849 0.0004
    ENSG00000001167.14  1829.828657 -0.221749678    0.153100403 -1.448393819    0.147506943 0.386446872
    ENSG00000001460.17  641.7582879 -0.252419377    0.183602552 -1.374814095    0.169189087 0.417816879",
    header = TRUE)
    
    dat <- mutate(df,threshold = case_when(padj <= 0.05 & log2FoldChange > 0 ~ "up",
                                           padj <= 0.05 & log2FoldChange < 0 ~ "down",
                                           TRUE ~ "NS"))
    dat
    #>               geneID  baseMean log2FoldChange     lfcSE       stat     pvalue
    #> 1 ENSG00000000003.14 2700.7913    -0.34546678 0.2023895 -1.7069405 0.08783312
    #> 2 ENSG00000000419.12 1571.1433    -0.34825874 0.1508075 -2.3092930 0.02092733
    #> 3 ENSG00000000457.13  526.2282    -0.05125021 0.1804821 -0.2839628 0.77643886
    #> 4 ENSG00000000460.16 1108.1387    -0.07853864 0.1678596 -0.4678829 0.63986832
    #> 5 ENSG00000001036.13 2662.1320     0.12141941 0.1752099  0.6929940 0.48831330
    #> 6 ENSG00000001084.10 1325.4473     0.89000000 0.1548754 -0.4232898 0.67208385
    #> 7 ENSG00000001167.14 1829.8287    -0.22174968 0.1531004 -1.4483938 0.14750694
    #> 8 ENSG00000001460.17  641.7583    -0.25241938 0.1836026 -1.3748141 0.16918909
    #>        padj threshold
    #> 1 0.0010000      down
    #> 2 0.1204784        NS
    #> 3 0.0030000      down
    #> 4 0.8273296        NS
    #> 5 0.7288428        NS
    #> 6 0.0004000        up
    #> 7 0.3864469        NS
    #> 8 0.4178169        NS
    

    Created on 2023-03-07 with reprex v2.0.2