Search code examples
rdataframegroupingunique

R: Transform dataframe to show the corresponding values of a column for each unique row of another column


I have a large dataframe with 2 columns: The one has a discrete number of values that appear repetitively, while the other only has unique values. Essentially multiple values in column 2 will correspond to one value in column 1.

As the data has currently been acquired, it lists each unique variable in column 2 as a row, which means there are repeated values in column 1.

I want to transform (essentially flip) the data so that I can see which column 2 values fall under each unique value in column 1.

For example, the df is:

Contig Gene
C20 G1
C10 G2
C40 G3
C20 G4
C40 G5
C30 G6

And I want:

Contig Gene
C10 G2
C20 G1, G4
C30 G6
C40 G3, G5

If I only get the number of unique values that will also be okay:

Contig Gene(s)
C10 1
C20 2
C30 1
C40 2

I hope it makes sense. I've been struggling to find the right keywords to explain this issue and really don't know where to begin. Although I get the feeling I should maybe turn the data into a list.


Solution

  • To get a dataframe like your first desired table, you may use

    library(tidyr)
    chop(df, Gene)
    

    while to get the last one you may use:

    library(dplyr)
    df %>% group_by(Config) %>% summarise(n = n())