Search code examples
rdplyrtidyversedata-cleaning

Take unique values from certain columns and add them to an existing column


Example data:

dat <- data.frame (meat  = c("lamb, beef", "chicken, pork, beef", "venison, beef, lamb", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA"),
                  fish = c("NA", "NA", "NA", "sardines, tuna, snapper", "snapper", "kingfish, sardines", "NA", "NA", "NA", "NA", "NA", "NA"),
                  veges = c("NA", "NA", "NA", "NA", "NA", "NA", "onions, potatoes, leeks", "celery, leeks", "carrots, onions, potatoes", "NA", "NA", "NA"),
                  food = c("NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "chicken, lamb, beef", "sardines, kingfish, tuna", "potatoes, leeks")
                  )

                  meat                    fish                     veges                     food
1           lamb, beef                      NA                        NA                       NA
2  chicken, pork, beef                      NA                        NA                       NA
3  venison, beef, lamb                      NA                        NA                       NA
4                   NA sardines, tuna, snapper                        NA                       NA
5                   NA                 snapper                        NA                       NA
6                   NA      kingfish, sardines                        NA                       NA
7                   NA                      NA   onions, potatoes, leeks                       NA
8                   NA                      NA             celery, leeks                       NA
9                   NA                      NA carrots, onions, potatoes                       NA
10                  NA                      NA                        NA      chicken, lamb, beef
11                  NA                      NA                        NA sardines, kingfish, tuna
12                  NA                      NA                        NA          potatoes, leeks


Lets say the above food data used to be collected by stating which items were present based on their category (meat, fish, veges) but now the data is collected inside a single food column. How do I get unique values inside the meat/fish/veges columns and add them to the food column (like the bottom three rows)?

There are multiple values separated by commas inside the meat/fish/veges columns, so will I first need to extract these?

I've spent a lot of time searching for an answer on here but I can't find a scenario that roughly matches what I'm wanting. I've played around with coalesce() and distinct(), but it doesn't look like they can be used in the way I'm wanting. It's difficult to articulate the problem in a way that will return relevant packages.

Edit: I apologise for screwing up the reprex - the NAs are actual NAs in the data. I also realised the reprex is not right. This is the example data and what I'm trying to get to happen. I want to move/copy the values from the 'cat_' columns to a 'new' column as below. Multiple text strings are separated by a comma:

dat <- data.frame(
  cat_1 = c("A,B", "C", "D,E,F", NA, NA, NA, NA, NA, NA),
  cat_2 = c(NA, NA, NA, "B,D", "G", "A", NA, NA, NA),
  cat_3 = c(NA, NA, NA, NA, NA, NA, "A,D", "B", "J,L"),
  new = c("A,B", "C", "D,E,F", "B,D", "G", "A", "A,D", "B", "J,L")
)

  cat_1 cat_2 cat_3   new
1   A,B  <NA>  <NA>   A,B
2     C  <NA>  <NA>     C
3 D,E,F  <NA>  <NA> D,E,F
4  <NA>   B,D  <NA>   B,D
5  <NA>     G  <NA>     G
6  <NA>     A  <NA>     A
7  <NA>  <NA>   A,D   A,D
8  <NA>  <NA>     B     B
9  <NA>  <NA>   J,L   J,L

Solution

  • First thing, you need to convert "NA" (which is a character) to a real NA (looks the same but isn't). In your data.frame() call, you're making them as text. Here's a quick way to correct that:

    dat[dat=="NA"] <- NA
    > dat
                      meat                    fish                     veges                     food
    1           lamb, beef                    <NA>                      <NA>                     <NA>
    2  chicken, pork, beef                    <NA>                      <NA>                     <NA>
    3  venison, beef, lamb                    <NA>                      <NA>                     <NA>
    4                 <NA> sardines, tuna, snapper                      <NA>                     <NA>
    5                 <NA>                 snapper                      <NA>                     <NA>
    6                 <NA>      kingfish, sardines                      <NA>                     <NA>
    7                 <NA>                    <NA>   onions, potatoes, leeks                     <NA>
    8                 <NA>                    <NA>             celery, leeks                     <NA>
    9                 <NA>                    <NA> carrots, onions, potatoes                     <NA>
    10                <NA>                    <NA>                      <NA>      chicken, lamb, beef
    11                <NA>                    <NA>                      <NA> sardines, kingfish, tuna
    12                <NA>                    <NA>                      <NA>          potatoes, leeks
    

    Now you can use the splice operator (!!!) with a vector setting the preferred order of columns that you want to coalesce():

    library(tidyverse)
    
    vec <- c("food", "veges", "fish", "meat")
    
    dat %>%
      mutate(food = coalesce(!!!select(.,all_of(vec))))
    

    Output:

    > dat %>%
    +   mutate(food = coalesce(!!!select(.,all_of(vec))))
    # A tibble: 12 × 4
    # Rowwise: 
       meat                fish                    veges                     food                     
       <chr>               <chr>                   <chr>                     <chr>                    
     1 lamb, beef          NA                      NA                        lamb, beef               
     2 chicken, pork, beef NA                      NA                        chicken, pork, beef      
     3 venison, beef, lamb NA                      NA                        venison, beef, lamb      
     4 NA                  sardines, tuna, snapper NA                        sardines, tuna, snapper  
     5 NA                  snapper                 NA                        snapper                  
     6 NA                  kingfish, sardines      NA                        kingfish, sardines       
     7 NA                  NA                      onions, potatoes, leeks   onions, potatoes, leeks  
     8 NA                  NA                      celery, leeks             celery, leeks            
     9 NA                  NA                      carrots, onions, potatoes carrots, onions, potatoes
    10 NA                  NA                      NA                        chicken, lamb, beef      
    11 NA                  NA                      NA                        sardines, kingfish, tuna 
    12 NA                  NA                      NA                        potatoes, leeks          
    

    EDIT ADDED:

    Okay, so you want to add the coalesced data as a new column. Same as above, but mutate a new column (new_col):

    > dat %>%
    +   mutate(new_col = coalesce(!!!select(.,all_of(vec))))  #  <-- see edit here
    # A tibble: 12 × 5
    # Rowwise: 
       meat                fish                    veges                     food                     new_col                  
       <chr>               <chr>                   <chr>                     <chr>                    <chr>                    
     1 lamb, beef          NA                      NA                        NA                       lamb, beef               
     2 chicken, pork, beef NA                      NA                        NA                       chicken, pork, beef      
     3 venison, beef, lamb NA                      NA                        NA                       venison, beef, lamb      
     4 NA                  sardines, tuna, snapper NA                        NA                       sardines, tuna, snapper  
     5 NA                  snapper                 NA                        NA                       snapper                  
     6 NA                  kingfish, sardines      NA                        NA                       kingfish, sardines       
     7 NA                  NA                      onions, potatoes, leeks   NA                       onions, potatoes, leeks  
     8 NA                  NA                      celery, leeks             NA                       celery, leeks            
     9 NA                  NA                      carrots, onions, potatoes NA                       carrots, onions, potatoes
    10 NA                  NA                      NA                        chicken, lamb, beef      chicken, lamb, beef      
    11 NA                  NA                      NA                        sardines, kingfish, tuna sardines, kingfish, tuna 
    12 NA                  NA                      NA                        potatoes, leeks          potatoes, leeks