I have a data frame:
# create a data frame
V1 = c("gene_1", "gene_1", "", "")
V2 = c("gene_2", "gene_2", "", "")
V3 = c("gene_3", "gene_3", "gene_4", "")
V4 = c("gene_4", "gene_4", "", "")
V5 = c("gene_5", "gene_5", "gene_8", "")
V6 = c("gene_6", "gene_6", "gene_6", "gene_7")
df = as.data.frame(rbind(V1, V2, V3, V4, V5, V6))
The data frame df looks like this:
V1 V2 V3 V4 V1 gene_1 gene_1 V2 gene_2 gene_2 V3 gene_3 gene_3 gene_4 V4 gene_4 gene_4 V5 gene_5 gene_5 gene_8 V6 gene_6 gene_6 gene_6 gene_7
Now, I want to remove all the rows that have only labels of the same gene, resulting in:
V1 V2 V3 V4 V3 gene_3 gene_3 gene_4 V5 gene_5 gene_5 gene_8 V6 gene_6 gene_6 gene_6 gene_7
I found several similar questions on stack overflow, including here but none of these solutions work for my exact issue. I feel like this should be easy, but I can't seem to find how to go about this.
I found a solution, based on another post that I found here:
df[df == ''] <- NA
df %>% filter(if_any(V2:V4, ~ .x != V1))
Gives:
V1 V2 V3 V4 V3 gene_3 gene_3 gene_4 <NA> V5 gene_5 gene_5 gene_8 <NA> V6 gene_6 gene_6 gene_6 gene_7