I am new to reshaping data frames. I have a df that I would like to make wider so I can use it in analysis such as cluster and NMDS. I have found several questions and (answers) related to how to reshape data containing primarily quantitative data (using aggregation functions), but in my case, my variables are all categorical.
As my df has a thousand rows and dozens of columns, I've created a toy df to use as an example. It looks like this:
df <- data.frame(
id=c("a","c", "a","b","d","c","e","d","c","a","a","e","a","b","d"),
color=c("red", "blue", "gray", "yellow", "green","green","blue","purple" ,"black","green","yellow","blue","red","yellow","gray"),
fruit=c("apple", "orange", "avocado", "strawberry", "banana", "apple", "orange", "avocado", "strawberry", "banana","banana", "strawberry", "watermelon", "lemon", "lemon" ),
country = c("Italy", "Spain", "Brazil", "Brazil", "Australia", "Italy", "Japan", "India", "USA", "Mexico", "USA", "Mexico", "Spain", "France", "France"),
animal=c("alligator", "camel", "alligator", "bat", "dolphin", "camel", "elephant", "dolphin", "camel", "alligator", "alligator", "elephant", "alligator", "bat", "dolphin"))
I would like the column "id" to be the first in my reshaped data frame, "animal" to be the second, and then the levels of "color", "fruit", and "country". The point here is that I wanted them separated.
The code below shows some attempts I've made:
df <- dplyr::mutate_if(df,is.character,as.factor)
attach(df)
dcast(df, id ~ color,value.var = "id") #The output is exactly what I wanted!
dcast(df, id + animal ~ color,value.var = "id") #Exactly what I wanted!
dcast(df, id + animal ~ fruit,value.var = "id") #Exactly what I wanted!
dcast(df, id ~ country, value.var = "id") #Not the output I wanted. Only "works well" if I specify "fun.aggregate=length". Why?
dcast(df, id ~ color + country, value.var = "id") #Not the output what I wanted.
dcast(df, id + animal~ color + country, value.var = "id") #Not the output I wanted.
dcast(df, id + animal~ color + country + fruit, value.var = "id") #Not the output I wanted.
My expected reshaped df should look like this:
To achieve that, I've tried all commands below, but none of those worked well:
dcast(df, id + animal ~ color + country + fruit, fun.aggregate=length)
dcast(df, id + animal ~ c(color, country, fruit), fun.aggregate=length)
dcast(df, id + animal ~ c("color", "country", "fruit"), fun.aggregate=length)
dcast(df, id + animal ~ color:fruit, fun.aggregate=length)
I have also tried to do that using tidyr::pivot_wider, without success.
Is there a way to achieve my goal using reshape2::dcast or tidyr::pivot_wider, or any other function in R? I would appreciate it if you guys could help me. Thanks in advance.
First, you have to pivot_longer
to get the column names you desire into a column. Then I arranged it by the future column names, so the words would be grouped, like your image, then I used pivot_wider
. It drops the animal column, so I put it back, then arranged by id, so they would be in the same observation order as your image.
pivot_longer(df, cols = color:country, names_to = "variable",
values_to = "value") %>% # column names to rows
arrange(variable, value) %>% # organize future column names
pivot_wider(!variable, names_from = value, values_from = animal,
values_fn = list(animal = length), values_fill = 0) %>%
left_join(distinct(df[,c(1,5)])) %>% # add animals back
select(id, animal, everything()) %>% # rearrange columns
arrange(id) # reorder observations
Update based on your comment - ordered by color, fruit, then country
Added mutate
and modified the first arrange
and pivot_wider
:
pivot_longer(df,cols = color:country, names_to = "variable",
values_to = "value") %>% # future col names to rows
mutate(ordering = ifelse(variable == "color", 1, # create organizer variable
ifelse(variable == "fruit", 2, 3))) %>%
arrange(ordering, value) %>% # organize future column order
pivot_wider(!c(variable,ordering), # make it wide
names_from = value,
values_from = animal,
values_fn = list(animal = length),
values_fill = 0) %>%
left_join(distinct(df[,c(1,5)])) %>% # add the animals back
select(id, animal, everything()) %>% # move animals to 2nd position
arrange(id) # reorder observations