I am trying to reshape a data frame, and the task is just different enough from a typical reshape that I can't quite swing it. The table is originally in excel, but I've already imported to R, so I can work with a solution in either software.
I have a data frame donors
containing names, values, and months:
NAME AMT MONTH
brett 100 jan
brett 100 mar
brett 100 nov
brett 50 apr
brett 50 jun
jane 75 dec
sam 200 jan
sam 200 feb
sam 200 mar
I need to reshape this data frame so that there is a NAME column, an AMT column, and a column for each month. Each row corresponds to any observation that is unique in both name and amount, and the month columns are binary variables (1 if that donor/amount combo occurred that month, and 0 if not.
So the above example would work out to:
NAME AMT JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
brett 100 1 0 1 0 0 0 0 0 0 0 1 0
brett 50 0 0 0 1 0 1 0 0 0 0 0 0
jane 75 0 0 0 0 0 0 0 0 0 0 0 1
sam 200 1 1 1 0 0 0 0 0 0 0 0 0
if you want to complete my code, create a final column equal to the total number of times that NAME/AMT combo occurred (just equal to the sum of the month columns)... but I can do that myself if you aren't feeling it.
Thank you!
Here's a tidyverse
attempt -
complete
to create rows with missing months.arrange
so that you have columns in correct order.pivot_wider
.library(dplyr)
library(tidyr)
df %>%
complete(MONTH = tolower(month.abb)) %>%
arrange(match(MONTH, tolower(month.abb))) %>%
pivot_wider(names_from = MONTH, values_from = MONTH,
values_fn = length, values_fill = 0) %>%
na.omit
# NAME AMT jan feb mar apr may jun jul aug sep oct nov dec
# <chr> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#1 brett 100 1 0 1 0 0 0 0 0 0 0 1 0
#2 sam 200 1 1 1 0 0 0 0 0 0 0 0 0
#3 brett 50 0 0 0 1 0 1 0 0 0 0 0 0
#4 jane 75 0 0 0 0 0 0 0 0 0 0 0 1
Note that month.abb
is inbuilt R vector which gives abbreviated month names. I used tolower(month.abb)
to match it with the actual column names in the data.
data
df <- structure(list(NAME = c("brett", "brett", "brett", "brett", "brett",
"jane", "sam", "sam", "sam"), AMT = c(100L, 100L, 100L, 50L,
50L, 75L, 200L, 200L, 200L), MONTH = c("jan", "mar", "nov", "apr",
"jun", "dec", "jan", "feb", "mar")),
class = "data.frame", row.names = c(NA, -9L))