I try to use gather and spread functions in tidyverse package, but it throws an error in spread function
library(caret)
dataset<-iris
# gather function is to convert wide data to long data
dataset_gather<-dataset %>% tidyr::gather(key=Type,value = Values,1:4)
head(dataset_gather)
# spead is the opposite of gather
This code below throws an error like this Error: Duplicate identifiers for rows
dataset_spead<- dataset_gather%>%tidyr::spread(key = Type,value = Values)
Added later: Sorry @alistaire, only saw your comment on the original post after posting this response.
As far as I understand Error: Duplicate identifiers for rows...
, it occurs when you have values with the same identifier. For example in the original 'iris' dataset, the first five rows of Species = setosa all have a Petal.Width of 0.2, and three rows of Petal.Length
have values of 1.4. Gathering those data isn't an issue, but when you try spread them, the function doesn't know what belongs to what. That is, which 0.2 Petal.Width and 1.4 Petal.Length belongs to which row of setosa.
The (tidyverse) solution I use in those circumstances is to create a unique marker for each row of data at the gather stage so that the function can keep track which duplicate data belong to which rows when you want to spread again. See example below:
# Load packages
library(dplyr)
library(tidyr)
# Get data
dataset <- iris
# View dataset
head(dataset)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3.0 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5.0 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
# Gather data
dataset_gathered <- dataset %>%
# Create a unique identifier for each row
mutate(marker = row_number(Species)) %>%
# Gather the data
gather(key = Type, value = Values, 1:4)
# View gathered data
head(dataset_gathered)
#> Species marker Type Values
#> 1 setosa 1 Sepal.Length 5.1
#> 2 setosa 2 Sepal.Length 4.9
#> 3 setosa 3 Sepal.Length 4.7
#> 4 setosa 4 Sepal.Length 4.6
#> 5 setosa 5 Sepal.Length 5.0
#> 6 setosa 6 Sepal.Length 5.4
# Spread it out again
dataset_spread <- dataset_gathered %>%
# Group the data by the marker
group_by(marker) %>%
# Spread it out again
spread(key = Type, value = Values) %>%
# Not essential, but remove marker
ungroup() %>%
select(-marker)
# View spread data
head(dataset_spread)
#> # A tibble: 6 x 5
#> Species Petal.Length Petal.Width Sepal.Length Sepal.Width
#> <fctr> <dbl> <dbl> <dbl> <dbl>
#> 1 setosa 1.4 0.2 5.1 3.5
#> 2 setosa 1.4 0.2 4.9 3.0
#> 3 setosa 1.3 0.2 4.7 3.2
#> 4 setosa 1.5 0.2 4.6 3.1
#> 5 setosa 1.4 0.2 5.0 3.6
#> 6 setosa 1.7 0.4 5.4 3.9
(and as ever, thanks to Jenny Bryan for the reprex
package)