The following dataframe has one row per each patient (the rowids correspond to the patients), and one single column.
df <- data.frame(
mutations = c('A497T', NA, 'C320T', 'A497T', NA, 'G621C', 'G621C')
)
This column tells whether the patient (row), has a given mutation, or not (NA).
I want to create a new dataframe where every unique mutation corresponds to a column, so for example, the first column will be "A497T", and every patient that presents this mutation will have a "Yes" value.
Same for the rest of the columns.
Original dataframe
mutations
<chr>
A497T
NA
C320T
A497T
NA
G621C
G621C
Desired output
A497T | C320T | G621C
<chr> | <chr> | <chr>
Yes |NA |NA
NA |NA |NA
NA |Yes |NA
Yes |NA |NA
NA |NA |NA
NA |NA |Yes
NA |NA |Yes
You can use table
:
table(rownames(df), df$mutations)
A497T C320T G621C
1 1 0 0
2 0 0 0
3 0 1 0
4 1 0 0
5 0 0 0
6 0 0 1
7 0 0 1