Search code examples
rdplyrtidy

Spread gather having issue when values are 0


I have a table which I have trying to pivot it using spread gather from tidyr. Here is the following data set

library(datapasta)
dpasta(chart_data)
actual<-data.frame(stringsAsFactors=FALSE,
   conversions = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
                   0L, 0L, 0L, 0L),
      platform = c("apple", "apple", "apple", "apple", "apple",
                   "apple", "apple", "apple", "apple", "apple",
                   "apple", "apple", "banana", "banana",
                   "banana", "oranges", "oranges",
                   "oranges", "oranges"),
          date = as.factor(c("2020-01-10", "2020-01-10", "2020-01-10",
                             "2020-01-10", "2020-01-10", "2020-01-10",
                             "2020-01-10", "2020-01-10", "2020-01-10", "2020-01-10",
                             "2020-01-10", "2020-01-10", "2020-01-10", "2020-01-10",
                             "2020-01-10", "2020-01-10", "2020-01-10",
                             "2020-01-10", "2020-01-10"))
)

Below is the code I am using to change it to spread gather

 chart_data <- chart_data %>% 
   tidyr::spread(key = platform, value = conversions)

What I am trying to get the output is like this


whatitshouldbe<-data.frame(stringsAsFactors=FALSE,date = as.factor(c("2020-01-10")),
                   apple = c(0L),
                   banana = c(0L),
                   oranges = c(1L)

)

But when I run the code I get the following error

Keys are shared for 19 rows:
* 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
* 13, 14, 15
* 16, 17, 18, 19```

How can I fix this or use some other method to convert it. Thank you

Solution

  • We could have a sequence by group to avoid the duplicate

    library(dplyr)
    library(tidyr)
    actual %>% 
       group_by(platform) %>%
       mutate(rn = row_number()) %>%
       ungroup %>% 
       spread(platform, conversions)
       #or use pivot_wider
       # pivot_wider(names_from = platform, values_from = conversions)
    # A tibble: 12 x 5
    #   date          rn apple banana oranges
    #   <fct>      <int> <int>  <int>   <int>
    # 1 2020-01-10     1     0      0       0
    # 2 2020-01-10     2     0      0       0
    # 3 2020-01-10     3     0      1       0
    # 4 2020-01-10     4     0     NA       0
    # 5 2020-01-10     5     0     NA      NA
    # 6 2020-01-10     6     0     NA      NA
    # 7 2020-01-10     7     0     NA      NA
    # 8 2020-01-10     8     0     NA      NA
    # 9 2020-01-10     9     0     NA      NA
    #10 2020-01-10    10     0     NA      NA
    #11 2020-01-10    11     0     NA      NA
    #12 2020-01-10    12     0     NA      NA