Example data:
dat <- data.frame (meat = c("lamb, beef", "chicken, pork, beef", "venison, beef, lamb", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA"),
fish = c("NA", "NA", "NA", "sardines, tuna, snapper", "snapper", "kingfish, sardines", "NA", "NA", "NA", "NA", "NA", "NA"),
veges = c("NA", "NA", "NA", "NA", "NA", "NA", "onions, potatoes, leeks", "celery, leeks", "carrots, onions, potatoes", "NA", "NA", "NA"),
food = c("NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "chicken, lamb, beef", "sardines, kingfish, tuna", "potatoes, leeks")
)
meat fish veges food
1 lamb, beef NA NA NA
2 chicken, pork, beef NA NA NA
3 venison, beef, lamb NA NA NA
4 NA sardines, tuna, snapper NA NA
5 NA snapper NA NA
6 NA kingfish, sardines NA NA
7 NA NA onions, potatoes, leeks NA
8 NA NA celery, leeks NA
9 NA NA carrots, onions, potatoes NA
10 NA NA NA chicken, lamb, beef
11 NA NA NA sardines, kingfish, tuna
12 NA NA NA potatoes, leeks
Lets say the above food data used to be collected by stating which items were present based on their category (meat, fish, veges) but now the data is collected inside a single food column. How do I get unique values inside the meat/fish/veges columns and add them to the food column (like the bottom three rows)?
There are multiple values separated by commas inside the meat/fish/veges columns, so will I first need to extract these?
I've spent a lot of time searching for an answer on here but I can't find a scenario that roughly matches what I'm wanting. I've played around with coalesce() and distinct(), but it doesn't look like they can be used in the way I'm wanting. It's difficult to articulate the problem in a way that will return relevant packages.
Edit: I apologise for screwing up the reprex - the NAs are actual NAs in the data. I also realised the reprex is not right. This is the example data and what I'm trying to get to happen. I want to move/copy the values from the 'cat_' columns to a 'new' column as below. Multiple text strings are separated by a comma:
dat <- data.frame(
cat_1 = c("A,B", "C", "D,E,F", NA, NA, NA, NA, NA, NA),
cat_2 = c(NA, NA, NA, "B,D", "G", "A", NA, NA, NA),
cat_3 = c(NA, NA, NA, NA, NA, NA, "A,D", "B", "J,L"),
new = c("A,B", "C", "D,E,F", "B,D", "G", "A", "A,D", "B", "J,L")
)
cat_1 cat_2 cat_3 new
1 A,B <NA> <NA> A,B
2 C <NA> <NA> C
3 D,E,F <NA> <NA> D,E,F
4 <NA> B,D <NA> B,D
5 <NA> G <NA> G
6 <NA> A <NA> A
7 <NA> <NA> A,D A,D
8 <NA> <NA> B B
9 <NA> <NA> J,L J,L
First thing, you need to convert "NA"
(which is a character) to a real NA
(looks the same but isn't). In your data.frame()
call, you're making them as text. Here's a quick way to correct that:
dat[dat=="NA"] <- NA
> dat
meat fish veges food
1 lamb, beef <NA> <NA> <NA>
2 chicken, pork, beef <NA> <NA> <NA>
3 venison, beef, lamb <NA> <NA> <NA>
4 <NA> sardines, tuna, snapper <NA> <NA>
5 <NA> snapper <NA> <NA>
6 <NA> kingfish, sardines <NA> <NA>
7 <NA> <NA> onions, potatoes, leeks <NA>
8 <NA> <NA> celery, leeks <NA>
9 <NA> <NA> carrots, onions, potatoes <NA>
10 <NA> <NA> <NA> chicken, lamb, beef
11 <NA> <NA> <NA> sardines, kingfish, tuna
12 <NA> <NA> <NA> potatoes, leeks
Now you can use the splice operator (!!!
) with a vector setting the preferred order of columns that you want to coalesce()
:
library(tidyverse)
vec <- c("food", "veges", "fish", "meat")
dat %>%
mutate(food = coalesce(!!!select(.,all_of(vec))))
Output:
> dat %>%
+ mutate(food = coalesce(!!!select(.,all_of(vec))))
# A tibble: 12 × 4
# Rowwise:
meat fish veges food
<chr> <chr> <chr> <chr>
1 lamb, beef NA NA lamb, beef
2 chicken, pork, beef NA NA chicken, pork, beef
3 venison, beef, lamb NA NA venison, beef, lamb
4 NA sardines, tuna, snapper NA sardines, tuna, snapper
5 NA snapper NA snapper
6 NA kingfish, sardines NA kingfish, sardines
7 NA NA onions, potatoes, leeks onions, potatoes, leeks
8 NA NA celery, leeks celery, leeks
9 NA NA carrots, onions, potatoes carrots, onions, potatoes
10 NA NA NA chicken, lamb, beef
11 NA NA NA sardines, kingfish, tuna
12 NA NA NA potatoes, leeks
Okay, so you want to add the coalesced data as a new column. Same as above, but mutate a new column (new_col
):
> dat %>%
+ mutate(new_col = coalesce(!!!select(.,all_of(vec)))) # <-- see edit here
# A tibble: 12 × 5
# Rowwise:
meat fish veges food new_col
<chr> <chr> <chr> <chr> <chr>
1 lamb, beef NA NA NA lamb, beef
2 chicken, pork, beef NA NA NA chicken, pork, beef
3 venison, beef, lamb NA NA NA venison, beef, lamb
4 NA sardines, tuna, snapper NA NA sardines, tuna, snapper
5 NA snapper NA NA snapper
6 NA kingfish, sardines NA NA kingfish, sardines
7 NA NA onions, potatoes, leeks NA onions, potatoes, leeks
8 NA NA celery, leeks NA celery, leeks
9 NA NA carrots, onions, potatoes NA carrots, onions, potatoes
10 NA NA NA chicken, lamb, beef chicken, lamb, beef
11 NA NA NA sardines, kingfish, tuna sardines, kingfish, tuna
12 NA NA NA potatoes, leeks potatoes, leeks