Search code examples
rtidyversesequence-diagram

R: geom_logo for multiple genes from tibble


I am not familiar so much with ggseqlogo package so I would appreciate any form of help.

I have prepared tibble that looks like:

test <- tibble( gene = c("A", "B", "A", "C", "B"),
                seq = c("AAAAAAAAAAAAAAAAAAAA",
                        "GGGGGGGGGGGGGGGG",
                        "AAAAAATAAAAATAAAAAAA",
                        "AGTCGTCATGCATCAATCCCAATGGTGCA",
                        "GGGGGGGCCGGGGGGG") ) 

I want to prepare seqlogo per each gene, based on the gene name. Per each gene sequences have the same length.

As far I've tried this:

ggplot() + 
 geom_logo(data = test$gene) +
 facet_grid(rows = ~ gene)

But as far this is the best what I got:


Solution

  • The response might be late but works.

    ggseqlogo has the option for facets but that requires unique gene names and equal size sequences. I avoided this issue by creating a loop and storing plots for all genes in a list.

    That plot list can then be arranged in via cowplots plot_grid

    library(tidyverse)
    library(cowplot)
    library(ggseqlogo)
    
    
    test <- tibble( gene = c("A", "B", "A", "C", "B"),
                    seq = c("AAAAAAAAAAAAAAAAAAAA",
                            "GGGGGGGGGGGGGGGG",
                            "AAAAAATAAAAATAAAAAAA",
                            "AGTCGTCATGCATCAATCCCAATGGTGCA",
                            "GGGGGGGCCGGGGGGG") ) 
    
    # Initialize list to store plots in
    plot_list <- list()
    
    # Loop through all genes and 
    # store the resulting plots in the plotlist
    for(i in 1:nrow(test)) {
      plot_list[[i]] <- ggplot() + 
        geom_logo(data = test[i,2],  seq_type = "dna") +
        ggtitle(paste0( test[i,1]))
    }
    
    # Cowplots can arrange the list based on your desire
    plot_grid(plotlist =  plot_list, ncol = 2)
    

    enter image description here