Search code examples
rdataframedplyrpivottidyverse

Error when using Pivot_wider function in R?


I am using pivot_wider function on the following sample data but look like I am missing something here as the output are producing NA's with not rightly placing data into its respective column. Any suggestion, please?

library(tidyverse)
set.seed(123)
DF <- data.frame(Date = seq(as.Date("1991-01-01"), 
                        to = as.Date("2000-12-31"), 
                        by = "year"), 
                        Parameter = rep(c("A","B"), times = 10),
                        Value = runif(10,1,50)) %>% 
                        pivot_wider(names_from = Parameter, values_from = Value)

Output

> print(DF)
# A tibble: 10 × 3
   Date       A         B        
   <date>     <list>    <list>   
 1 1991-01-01 <dbl [2]> <NULL>   
 2 1992-01-01 <NULL>    <dbl [2]>
 3 1993-01-01 <dbl [2]> <NULL>   
 4 1994-01-01 <NULL>    <dbl [2]>
 5 1995-01-01 <dbl [2]> <NULL>   
 6 1996-01-01 <NULL>    <dbl [2]>
 7 1997-01-01 <dbl [2]> <NULL>   
 8 1998-01-01 <NULL>    <dbl [2]>
 9 1999-01-01 <dbl [2]> <NULL>   
10 2000-01-01 <NULL>    <dbl [2]>

Solution

  • I think you've got an error in how you're generating your sample data. Apologies if this isn't what you're looking for, but here's how I think you can get your desired output:

    EDIT: @s_pike provided the same solution below, about a minute before me, with an explanation of what we both changed. Sorry for omission and thanks to them for that- included now for clarity.

    The problem is in your call to rep(). You're currently repeating A and B ten times, for a total of 20 values in the Parameter column. Because you're providing a vector of 10 dates, it will be recycled to match the length of Parameter. Because A and B are replicated one after the other, each date has the same Parameter value when Date is recycled: the cause of the duplication.

    If instead you change the call to: Parameter = c(rep("A",10),rep("B",10)) each value in Date gets a Parameter of both A and B, so there are no duplications. See below:

    > set.seed(123)
    > DF <- data.frame(Date = seq(as.Date("1991-01-01"),
                                to = as.Date("2000-12-31"), 
                                by = "year"), 
                     Parameter = c(rep("A",10),rep("B",10)),
                     Value = runif(10,1,50))
    
    > DF
             Date Parameter     Value
    1  1991-01-01         A 15.091298
    2  1992-01-01         A 39.626952
    3  1993-01-01         A 21.039869
    4  1994-01-01         A 44.267853
    5  1995-01-01         A 47.082897
    6  1996-01-01         A  3.232268
    7  1997-01-01         A 26.877169
    8  1998-01-01         A 44.728533
    9  1999-01-01         A 28.020316
    10 2000-01-01         A 23.374122
    11 1991-01-01         B 15.091298
    12 1992-01-01         B 39.626952
    13 1993-01-01         B 21.039869
    14 1994-01-01         B 44.267853
    15 1995-01-01         B 47.082897
    16 1996-01-01         B  3.232268
    17 1997-01-01         B 26.877169
    18 1998-01-01         B 44.728533
    19 1999-01-01         B 28.020316
    20 2000-01-01         B 23.374122
    

    This should do what you want and your pivot_wider should work now:

    > DF %>% 
    +     pivot_wider(names_from = Parameter, values_from = Value)
    # A tibble: 10 × 3
       Date           A     B
       <date>     <dbl> <dbl>
     1 1991-01-01 15.1  15.1 
     2 1992-01-01 39.6  39.6 
     3 1993-01-01 21.0  21.0 
     4 1994-01-01 44.3  44.3 
     5 1995-01-01 47.1  47.1 
     6 1996-01-01  3.23  3.23
     7 1997-01-01 26.9  26.9 
     8 1998-01-01 44.7  44.7 
     9 1999-01-01 28.0  28.0 
    10 2000-01-01 23.4  23.4