Search code examples
rdataframedplyr

Convert a dataframe where each row has categorical data into a new dataframe with each category represented as a separate column


The following dataframe has one row per each patient (the rowids correspond to the patients), and one single column.

df <- data.frame(
  mutations = c('A497T', NA, 'C320T', 'A497T', NA, 'G621C', 'G621C')
)

This column tells whether the patient (row), has a given mutation, or not (NA).

I want to create a new dataframe where every unique mutation corresponds to a column, so for example, the first column will be "A497T", and every patient that presents this mutation will have a "Yes" value.

Same for the rest of the columns.

Original dataframe

mutations
<chr>
A497T               
NA              
C320T               
A497T               
NA              
G621C               
G621C

Desired output

A497T  | C320T  |  G621C
<chr>  | <chr>  |  <chr>
Yes    |NA      |NA
NA     |NA      |NA
NA     |Yes     |NA
Yes    |NA      |NA
NA     |NA      |NA
NA     |NA      |Yes
NA     |NA      |Yes

Solution

  • You can use table:

    table(rownames(df), df$mutations)
        A497T C320T G621C
      1     1     0     0
      2     0     0     0
      3     0     1     0
      4     1     0     0
      5     0     0     0
      6     0     0     1
      7     0     0     1