Search code examples
rtidyversetransformationdata-transform

R Data transformation: dataframe to matrix filled with 0 and 1


I am looking to transform a long dataformat to a wide one, but the values of the "signatories" column are not exactly the same as the new column names should be.

I have a dataframe with bill numbers and the names of their signatories (senators). If a senator has signed a bill, their is a row in the dataframe with the bill number and the name of the senator in questions. All the senators whose names are not listed next to the bill number have not signed the bill.

I want to construct a matrix with rows as bill numbers and columns as senator names, filled with 1s and 0s only. 1 indicates that a given senator has signed a bill, 0 indicates they have not.

ex <- data.frame(billno = c(715851, 715851, 715851,715852, 715852, 715852, 715852, 715852, 715852), signatories = c("Ben", "Lisa", "Roger", "Louise", "Macy", "John", "Jake", "James", "Ben"))

Senatornames <- c("Ben", "Lisa", "Roger", "Louise", "Macy", "John", "Jake", "James", "Julian", "Ayn")

#current output is this 
#  billno signatories
#1 715851         Ben
#2 715851        Lisa
#3 715851       Roger
#4 715852      Louise
#5 715852        Macy
#6 715852        John
#7 715852        Jake
#8 715852       James
#9 715852         Ben

#I want something like this. How do I arrive at this result? 
#  billno Ben   Lisa    Roger   Louise   Macy   John   Jake   James    Ayn
#1 715851  1      1       1       0        0      0       0      0      0
#2 715852  1      0       0       1        1      1       1      1      0

Thanks so much!!


Solution

  • Next code can bring you close to what you want. I have used the vector you provided in order to create a full_join() so that all values can appear. Here the code:

    library(tidyverse)
    #Dataframe
    df <- data.frame(signatories=Senatornames,stringsAsFactors = F)
    #Code
    ex1 <- ex %>% mutate(Value=1) %>%
      full_join(df) %>%
      fill(billno) %>%
      pivot_wider(names_from = signatories,values_from=Value) %>%
      replace(is.na(.),0)
    

    Output:

    # A tibble: 2 x 11
      billno   Ben  Lisa Roger Louise  Macy  John  Jake James Julian   Ayn
       <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl>
    1 715851     1     1     1      0     0     0     0     0      0     0
    2 715852     1     0     0      1     1     1     1     1      0     0