I am using tidyr and creating a new column using mutate to sum how many 0's were returned in a different column I have. For some reason, although the new column forms, I am receiving NA's throughout the column even when I can see there should be an answer of at least one (e.g. I see a 0 in a column, but the "count" (total) column still reads N/A".
This code worked previously on a nearly identical dataset for the same type of question, can someone explain to me what is going on? A copy of my code is below.
Gathered <- ScottCrkMeta250918 %>%
gather(SNP, Genotype, 43:234)
Prefailed <- Gathered %>%
group_by(NMFS_DNA_ID, BOX_ID,BOX_POSITION) %>%
mutate(Count = sum(Genotype == 0))
I am trying to see how many SNPs failed, therefore I have 0s in columns where there was a failure. I am trying to tell R to tally up these zeroes (failures) and give them to me in a separate column.
Unfortunately you don't share data, so this is a bit of guess. So I'm guessing that Genotype
contains NA
s. In this case, try replacing your code with
Prefailed <- Gathered %>%
group_by(NMFS_DNA_ID, BOX_ID, BOX_POSITION) %>%
mutate(Count = sum(Genotype == 0, na.rm = TRUE))
Here is a minimal reproducible code example to demonstrate
set.seed(2018)
df <- data.frame(
Genotype = sample(c(NA, 0, 1), 10, replace = T))
df %>%
mutate(
Count_without_NA_removed = sum(Genotype == 0),
Count_with_NA_removed = sum(Genotype == 0, na.rm = T))
# Genotype Count_without_NA_removed Count_with_NA_removed
#1 0 NA 5
#2 0 NA 5
#3 NA NA 5
#4 NA NA 5
#5 0 NA 5
#6 NA NA 5
#7 0 NA 5
#8 NA NA 5
#9 1 NA 5
#10 0 NA 5