Search code examples
rdataframedplyrpivotdata-wrangling

R pivot dataframe to reshape


I have two data dataframes

Category = c("New", "Fixed" ,"Regular")
`Jan 2014` = c(-418518172, 24425087,0.3016606 )
`Feb 2014`=c(-168947846,17919651,-0.7212493)
`Mar 2014`=c(-168846,17544,-0.775493)
df1 <- data.frame(Category, `Jan 2014`, `Feb 2014`,`Mar 2014`, check.names = FALSE)
Category = c("New","Fixed" ,"Regular")
`Jan 2014` = c(-8172,-8524,0.5060900)
`Feb 2014`=c(-1686,19929,-0.85059894)
`Mar 2014`=c(-2424246,1232324,-0.7217593)
df2 <- data.frame(Category, `Jan 2014`, `Feb 2014`, `Mar 2014`,check.names = FALSE)
calculate_df <- cbind(df1,(df1[-1]/df2[-1])*100)

After doing some calculation i am trying to reshape it so that all the dates becomeDatecolumn and then each type of Category New Fixed and Regular comes under separate column with their respective values.

reshape_df <- calculate_df %>% 
  pivot_longer(
    cols = !Category, 
    names_to = "Date", 
    values_to = "Values"
  )

calculate_df

Current Output

 Category Date       Values
   <chr>    <chr>       <dbl>
 1 New      Jan 2014 -4.19e+8
 2 New      Feb 2014 -1.69e+8
 3 New      Mar 2014 -1.69e+5
 4 New      Jan 2014  5.12e+6
 5 New      Feb 2014  1.00e+7
 6 New      Mar 2014  6.96e+0
 7 Fixed    Jan 2014  2.44e+7
 8 Fixed    Feb 2014  1.79e+7
 9 Fixed    Mar 2014  1.75e+4
10 Fixed    Jan 2014 -2.87e+5
11 Fixed    Feb 2014  8.99e+4
12 Fixed    Mar 2014  1.42e+0
13 Regular  Jan 2014  3.02e-1
14 Regular  Feb 2014 -7.21e-1
15 Regular  Mar 2014 -7.75e-1
16 Regular  Jan 2014  5.96e+1
17 Regular  Feb 2014  8.48e+1
18 Regular  Mar 2014  1.07e+2

How can i achieve the expected outcome?

Expected Output

 Date       Category    Values  Category  Values   Category  Values   

 1 Jan 2014    New      -4.19e+8 Fixed   2.44e+7   Regular   3.02e-1
 2 Feb 2014    New      -1.69e+8 Fixed   1.79e+7   Regular   -7.21e-1
 3 Mar 2014    New      -1.69e+5 Fixed   1.75e+4   Regular   -7.75e-1
 4 Jan 2014    New      5.12e+6  Fixed   -2.87e+5  Regular   5.96e+1
 5 Feb 2014    New      1.00e+7  Fixed   8.99e+4   Regular   8.48e+1
 6 Mar 2014    New      6.96e+0  Fixed   1.42e+0   Regular   1.07e+2


Solution

  • In base R, you may start from here (and, probably, you will find a less verbose solution soon). Notice that, it appears unclear to me why wide is better than long. It is not good practice to have repeated column names. However, an exception might be publishing.

    # to preserve the order of levels 
    fb_split_df = \(x, f, ...) {
      f = factor(f, levels = unique(f))
      split(x, f, ...)
    }
    xyzzy = fb_split_df(reshape_df, reshape_df$Category) 
    xyzzy[[1]] = xyzzy[[1]][c("Date", "Category", "Values")]
    xyzzy[-1] = lapply(xyzzy[-1], \(x) transform(x, Date = NULL))
    do.call(cbind, xyzzy) |> setNames(c("Date", rep(c("Category", "Values"), 3)))
    

    gives

          Date Category        Values Category        Values Category      Values
    1 Jan 2014      New -4.185182e+08    Fixed  2.442509e+07  Regular   0.3016606
    2 Feb 2014      New -1.689478e+08    Fixed  1.791965e+07  Regular  -0.7212493
    3 Mar 2014      New -1.688460e+05    Fixed  1.754400e+04  Regular  -0.7754930
    4 Jan 2014      New  5.121368e+06    Fixed -2.865449e+05  Regular  59.6061175
    5 Feb 2014      New  1.002063e+07    Fixed  8.991746e+04  Regular  84.7931106
    6 Mar 2014      New  6.964887e+00    Fixed  1.423652e+00  Regular 107.4448227
    

    Starting with calculate_df:

    custom_reshape = \(df) {
      tdf = t(df) 
      cats = tdf[1, ]
      n = length(cats)
      tdf = tdf[-1, ]
      lapply(seq_len(ncol(tdf)), \(i) cbind(rep(cats[[i]], n), tdf[, i])) |> 
        do.call(cbind, args = _) |>
        data.frame(row.names(tdf), ... = _, row.names = NULL)
    }
    custom_reshape(calculate_df) |>
      setNames(c("Date", rep(c("Category", "Values"), 3)))