Search code examples
rexceldataframereshape

How do I execute this unique reshaping in R (or excel)?


I am trying to reshape a data frame, and the task is just different enough from a typical reshape that I can't quite swing it. The table is originally in excel, but I've already imported to R, so I can work with a solution in either software.

I have a data frame donors containing names, values, and months:

NAME    AMT    MONTH
brett   100    jan
brett   100    mar
brett   100    nov
brett   50     apr
brett   50     jun
jane    75     dec
sam     200    jan
sam     200    feb
sam     200    mar

I need to reshape this data frame so that there is a NAME column, an AMT column, and a column for each month. Each row corresponds to any observation that is unique in both name and amount, and the month columns are binary variables (1 if that donor/amount combo occurred that month, and 0 if not.

So the above example would work out to:

NAME   AMT   JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
brett  100   1   0   1   0   0   0   0   0   0   0   1   0
brett  50    0   0   0   1   0   1   0   0   0   0   0   0
jane   75    0   0   0   0   0   0   0   0   0   0   0   1
sam    200   1   1   1   0   0   0   0   0   0   0   0   0

if you want to complete my code, create a final column equal to the total number of times that NAME/AMT combo occurred (just equal to the sum of the month columns)... but I can do that myself if you aren't feeling it.

Thank you!


Solution

  • Here's a tidyverse attempt -

    • complete to create rows with missing months.
    • arrange so that you have columns in correct order.
    • Get data in wide format using pivot_wider.
    library(dplyr)
    library(tidyr)
    
    df %>%
      complete(MONTH = tolower(month.abb)) %>%
      arrange(match(MONTH, tolower(month.abb))) %>%
      pivot_wider(names_from = MONTH, values_from = MONTH, 
                  values_fn = length, values_fill = 0) %>%
      na.omit
    
    #  NAME    AMT   jan   feb   mar   apr   may   jun   jul   aug   sep   oct   nov   dec
    #  <chr> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
    #1 brett   100     1     0     1     0     0     0     0     0     0     0     1     0
    #2 sam     200     1     1     1     0     0     0     0     0     0     0     0     0
    #3 brett    50     0     0     0     1     0     1     0     0     0     0     0     0
    #4 jane     75     0     0     0     0     0     0     0     0     0     0     0     1
    

    Note that month.abb is inbuilt R vector which gives abbreviated month names. I used tolower(month.abb) to match it with the actual column names in the data.

    data

    df <- structure(list(NAME = c("brett", "brett", "brett", "brett", "brett", 
    "jane", "sam", "sam", "sam"), AMT = c(100L, 100L, 100L, 50L, 
    50L, 75L, 200L, 200L, 200L), MONTH = c("jan", "mar", "nov", "apr", 
    "jun", "dec", "jan", "feb", "mar")), 
    class = "data.frame", row.names = c(NA, -9L))