Search code examples
rdataframedata-management

How to match three columns differing in the pattern?


I have similar data to the following (but larger one):

example <- rbind(data.frame(species = "A", trait1 = "yes", trait2 = NA),      
     data.frame(species = "A", trait1 = NA, trait2 = "yes"), 
     data.frame(species = "B", trait1 = NA, trait2 = "no"), 
     data.frame(species = "B", trait1 = "yes", trait2 = NA), 
     data.frame(species = "B", trait1 = "no", trait2 = NA),
     data.frame(species = "B", trait1 = "no", trait2 = NA), 
     data.frame(species = "C", trait1 = NA, trait2 = "no"), 
     data.frame(species = "C", trait1 = "no", trait2 = NA), 
     data.frame(species = "D", trait1 = "yes", trait2 = NA), 
     data.frame(species = "D", trait1 = NA, trait2 = "yes"), 
     data.frame(species = "E", trait1 = NA, trait2 = "no"), 
     data.frame(species = "E", trait1 = "no", trait2 = NA), 
     data.frame(species = "E", trait1 = "no", trait2 = NA))

Here, trait2 is a fixed value (1 value for 1 species), but trait1 is variable within species. For each trait value, data for species come from different rows. After data management, I want to keep the variability existed in trait1, and this seems to make the process a bit complex.

Finally, I want to transform this data frame in R to the following one:

ex.res <- rbind(data.frame(species = "A", trait1 = "yes", trait2 = "yes"), 
        data.frame(species = "B", trait1 = "yes", trait2 = "no"),
        data.frame(species = "B", trait1 = "no", trait2 = "no"),
        data.frame(species = "B", trait1 = "no", trait2 = "no"),
        data.frame(species = "C", trait1 = "no", trait2 = "no"),
        data.frame(species = "D", trait1 = "yes", trait2 = "yes"),
        data.frame(species = "E", trait1 = "no", trait2 = "no"),
        data.frame(species = "E", trait1 = "no", trait2 = "no"))

I have tried many things, including some basic data management tools in R, and also duplicated, unique, and match_df functions, but they were not successful in doing this completely.

Maybe a combined version of such functions could have been worked, but I am not able to do this. Is there any simple way to do it?


Solution

  • An approach by using base R,

    first_part <- example[!is.na(example$trait1),]
    second_part <- example[!is.na(example$trait2),]
    
    merge(first_part[,-3], second_part[,-2], by="species")
    
    
    
       species trait1 trait2
    1       A    yes    yes
    2       B    yes     no
    3       B     no     no
    4       B     no     no
    5       C     no     no
    6       D    yes    yes
    7       E     no     no
    8       E     no     no