Search code examples
rdataframedplyrdata-cleaning

Finding common columns in two dataframes and retaining only those columns that are common between the two


I have a dataset that has gene names. The gene names In that dataset I wanted to extract them from two other datasets which are gd and cd.

common_genes is a vector that has the the gene names that I want to search

I want assistance on how, I can be able to have common columns in both the cd and gd datasets, using the common genes. This is because my analysis will require me to do comparisons between those two datasets.

#Extract those that are present in the `gd` dataset.
common_genes <- intersect(gene_names, colnames(gd))
# extract these 300 genes too from the `gd` for common genes
A <- gd[, common_genes]
#Extract these 300 genes too from the `cd` dataset.
common_genes2 <- intersect(gene_names, colnames(cd))
B<-cd[,common_genes]

The output I get is for A 150 genes out of 300 and B 200 genes out of 300.

My desired output is the example below:

A
RPL26 MS4A1 ELK1 SNIP1
200   300     400     534
B
RPL26 MS4A1 ELK1 SNIP1
100    81    91  112

Solution

  • Since your example isn't reproducible, I created my own

    df1 <- data.frame(
        a = 1:3,
        b = 2:4,
        c = 3:5)
    df2 <- data.frame(
        b = 4:6,
        c = 5:7,
        d = 6:8)
    
    # it's unclear to me why you wouldn't think to use intersect, when it's right there in your question?
    common_cols <- intersect(colnames(df1), colnames(df2))
    
    df1 <- df1[,common_cols]
    df2 <- df2[,common_cols]
    

    df1 afterwards:

      b c
    1 2 3
    2 3 4
    3 4 5