Search code examples
rbioinformaticsfasta

create and save fasta file from stringset


I have this DNA stringset, but I want to create a new file.fa containing this information. What is an efficient way to save these? I've tried to use write.fasta but it crashed.

genes_seq <- 
  A DNAStringSet instance of length 254667
       width seq                                                                    names               
   [1]  2298 ATGGTGTCGTCTCCTTTCTATGTGAACAAGTTCA...AAAAAAATCAAAAACAAATCAAAAATCAAAAAA 22
   [2]  2600 CTGACATAGATAAGTTTAGAGTTACCTCCCCTGT...GATACATACACATATATATCCATGTAAGATAGA 22
   [3]  1351 ACACATTTATATATATTTATAAATATCAATAAAT...CGCATGTGTGTGTATGAGAGAGAGAGAGAGAGC 22
   [4]  3668 TTGTTGATCAGCAGTAATGGTAAGGAAGTTAGTA...CACGAAATCATTGGGTTATTTTTTATACCAGTA 22
   [5]   762 ATGACCATCTTTGGGGCAGAATCCACTTTTCATC...TCATTGGTCAGTTTTATTAAAGGCAGCATTTAA 22
   ...   ... ...
[2544]   558 CTAGATCCTTCTCCTGCTGTTATCAAAAGTAGAC...ACTGATGTAATACTGCAATTAAACATGATAGCA 22
[2545]  1319 TTGAAAATGAATTATAGAAATGTCTTTTTCACGA...ACTTGCACTAAAACATTTAGCAATTTGGTTAGC 22
[2546]  1365 GTATTTTGTTTCAAATGTACAAGCTTGGACAACA...GACTGCATGCATTTACATTTATGTAAATACAAA 22
[2547]  1970 CAGAATACCAGAAACAGCGAAGAATTTTTCACAT...GAAATATATATGTGTGTGTATATATAAATAAAT 22
[2548]   260 TTTATTTTTATTCAAAAGACATGGACATTAAAGG...TCTACAGCTTTGCATTATGCTGTGACGGGGTAA OCBIM_22024624mg
> 

Solution

  • The solution came from here https://bioinformatics.stackexchange.com/questions/3538/combine-fastq-by-writefastq-is-not-working-properly

    # Non reproducible example
    library(ShortRead)
    library(Biostrings)
    
    head(fasta)
     # A DNAStringSet instance of length 6
    # width seq                                                                                                                                                                                   names                                                                                                                                                                                                    
    # [1]  1786 GGGGAGCCCGCAGAATTCGGAAAAAATCGTACGCTAAGGTTTTCCGGGCATCCGTAAGGGCCGAAACTTCCCGTCTTCCAGTCTGCG...GGTGCATCGGCCGGCACCTTGCGCAGGTTGTCGGCGTTCATCTCACGCAGGGTCTGCACGGCTGCCAGCACGCCTTGCGCGGCCGGC NODE_108_length_1...
    # [2]   590 GGTCAGCCAGGATTTCACTTTCCAGCCGGTCGAGCATCTGCACCAGCACCGGCGGGAACACCACACTGCCACCGTCTTCGCCGCCGG...TGACGGTCATACCGGTAAAGATAGTGCGCGTCACGGGCGATACGGTTATCCGGCCACATGCTGAGGGTGCTGTCCGGGTGCAGCTCC NODE_145_length_5...
    # [3]  2618 CTCTCCCGCACCTACAGCAGTTACCGGACAAAAACGCCCGCGCCGGTGGGGAGCCTCGGCCCCGGCTGGAAAATGCCTGCGGATATC...GGACAGCACCCTCAGCATGTGGCCGGATAACCGTATCGCCCGTGACGCGCACTATCTTTACCGGTATGACCGTCACGGCAGGCTGAC NODE_96_length_26...
    # [4]   446 CTGCTGTGCTGTTTTGGTCCATCGGTGCCGCATACATGCCCGATACAGCCGCGGCACCCAGCCAGCCCACAGGGTTCCACCATGCCA...AAATCCCCGTAAAGGCAGATGCGTGCCATGCCCGGTGACGCCAGAGGGAGTGTGTGCGTCGCTGCCATTTGTCGGTGTACCTCTCTC NODE_192_length_4...
    # [5]   235 CCCCTGCAGCGGGTCATAATAGCGGTGGCGGTTGTAATACAGGCCGGACTCCTCATCATACTGCTGCCCCGGCAGGCGGATAAGCTG...CACGCTGTTGCCCCTTCCGTGCTGATAAGCGCCAGCGGCAGGCCGCGATGGTCGCAGTGGTACAGGTGGATTTTTCGCGCCGGCGTG NODE_556_length_2...
    # [6]   650 CCCTGCCAGGTGTACTGCAGTTGTGGCTCCAGCATCAGGTTGTCAGTGATACTGAAGGGCAGACCGGTTTCCAGTGAGCCCAGCCAG...ACGATAAGCATTTTCACTGCGCAGGTACCAGTCTTCATCGCTGTCACGGTTCAGGGTGTAGTTAAAGGCGCCGGCCTGAAGCGGGCG NODE_137_length_6...
    
    fasta_dir <- file.path(getwd(), "refs")
    outfile <- file.path(dirname(fasta_dir), "seq_fasta.fasta")
    writeFasta(fasta, outfile, mode = "a")