Search code examples
rloopsdplyrmultiple-columnsrscript

Getting the Total counts in each sample of a huge single cell dataframe


I have a huge metadata file with 79 Columns and 78687 Rows. This metadata is from our cancer experiment results. I am using dplyr to query the cell counts for each sample in that metadata.

I have 16 samples:

Sample list(Var1),

I need to find the cell counts for each condition (Tumor or Normal or MSS_Status) in each sample. I am doing it individually so for as follows

dim(meta %>% filter(Condition == "Tumor" & MSI_Status=="MSS" & Location =="Left" & orig.ident == "B_cac10"));

# 689  24

I am sure there is an intelligent way to do it, how can I loop this to get an answer in one go?

P.S: I am a Biologist and my knowledge is very limited in Looping or coding

EDIT: 1

reproducible Example

df <- data.frame(Condition = c("Normal","Normal","Normal","Tumor","Tumor","Tumor"),
                 MSI_Status = c("High", "High", "High", "Low", "Low", "Low"),
                 Location = c("Lungs", "Lungs", "Lungs", "Kidney", "Kidney", "Liver"), 
                 Clusters = c(1,2,4,2,2,6), 
                 orig.ident = c("B-cac10","B-cac11","T-cac15","B-cac15","B-cac19","T-cac22"))

My Codes:

df %>% filter(Condition == "Tumor" & MSI_Status=="Low" & Location
=="Kidney" & orig.ident == "B-cac15")

Expected results:

Each orig.idents counts should be given under Condition "Tumor ", MSI_Status=="Low" & Location = "Kidney"

Thanks a lot for your Help, Stay Safe. Dave


Solution

  • You can use the dplyr function filter to subset the data based on your criteria. Then you can use the dplyr count function to count the unique values in orig.ident. As alluded to in the comments, you can opt to set name = Freq from within this function. I opted to use the rename function instead to be as explicit as possible since you are new to R.

    Data

    df <- data.frame(Condition = 
    c("Normal","Normal","Normal","Tumor","Tumor","Tumor"), MSI_Status = 
    c("High", "High", "High", "Low", "Low", "Low"), Location = c("Lungs", 
    "Lungs", "Lungs", "Kidney", "Kidney", "Liver"), Clusters = 
    c(1,2,4,2,2,6), orig.ident=c("B-cac10","B-cac11","T-cac15","B- 
    cac15","B-cac19","T-cac22"))
    

    Code

    library(dplyr)
    
    df %>% 
      filter(Condition == "Tumor" & 
             MSI_Status == "Low" & 
             Location == "Kidney") %>% 
      count(orig.ident) %>% 
      rename(Freq = n)
    
    #>   orig.ident Freq
    #> 1    B-cac15    1
    #> 2    B-cac19    1
    

    Created on 2020-09-05 by the reprex package (v0.3.0)