Search code examples
scalafilegetlinefromfile

Scala : bug with getLines?


I'm facing a problem on a very simple file usage in scala I don't understand if this is from a bug or a misunderstanding what I'm doing... Even reproducible from a worksheet in scala/eclipse IDE. I'm using IDE4.6.1 and scala 2.12.2 Code is very simple :

//********************************
import scala.io.Source
import java.io.File
import java.io.PrintWriter

object Embed {

  val filename = "proteins.csv"
  val handler = Source.fromFile(filename)

  val header:String = handler.getLines().next()
  println (">"+header)
  val header2:String = handler.getLines().next()
  println (">"+header2)

  val header3:String = handler.getLines().next()
  println (">"+header3)
}
//**********************

first 3 lines of the file are a bit long and of non sense for non bio specialists :

Protein Group,Protein ID,Accession,Significance,Coverage (%),#Peptides,#Unique,PTM,Cond_A Intensity,Cond_B Intensity,Cond_C Intensity,Cond_D Intensity,Sample Profile (Ratio),Group 1 Intensity,Group 2 Intensity,Group 3 Intensity,Group 4 Intensity,Group Profile (Ratio),Avg. Mass,Description
261,247,P0AFG4|ODO1_ECOL6,200.00,39,30,30,Carbamidomethylation; Deamidation (NQ); Oxidation (M),1.7E5,9.87E4,5.51E4,3.09E4,3.09:1.79:1.00:0.56,1.7E5,9.87E4,5.51E4,3.09E4,3.09:1.79:1.00:0.56,105062,2-oxoglutarate dehydrogenase E1 component OS=Escherichia coli O6:H1 (strain CFT073 / ATCC 700928 / UPEC) GN=sucA PE=3 SV=1
287,657,B7NDL4|MDH_ECOLU,200.00,54,14,1,Carbamidomethylation; Deamidation (NQ); Oxidation (M),6.27E4,4.14E4,1.81E4,1.28E4,3.47:2.29:1.00:0.71,6.27E4,4.14E4,1.81E4,1.28E4,3.47:2.29:1.00:0.71,32336,Malate dehydrogenase OS=Escherichia coli O17:K52:H18 (strain UMN026 / ExPEC) GN=mdh PE=3 SV=1

I won't go into this file details but it is a 3600 lines file, each containing 20 fields separated by commas and a '' end of line. First line is teh header. I tried also with only and only with same result : First line is read correctly but second line read is only the final part of the 8th line in the file, and so on then I cannot read/parse my file :

Following is the result I get

   val filename = "proteins.csv"
                                                  //> filename  : String = proteins.csv
  val handler = Source.fromFile(filename)         //> handler  : scala.io.BufferedSource = non-empty iterator

  val header:String = handler.getLines().next()   //> header  : String = Protein Group,Protein ID,Accession,Significance,Coverage 
                                                  //| (%),#Peptides,#Unique,PTM,Cond_A Intensity,Cond_B Intensity,Cond_C Intensity
                                                  //| ,Cond_D Intensity,Sample Profile (Ratio),Group 1 Intensity,Group 2 Intensity
                                                  //| ,Group 3 Intensity,Group 4 Intensity,Group Profile (Ratio),Avg. Mass,Descrip
                                                  //| tion
  println (">"+header)                            //> >Protein Group,Protein ID,Accession,Significance,Coverage (%),#Peptides,#Uni
                                                  //| que,PTM,Cond_A Intensity,Cond_B Intensity,Cond_C Intensity,Cond_D Intensity,
                                                  //| Sample Profile (Ratio),Group 1 Intensity,Group 2 Intensity,Group 3 Intensity
                                                  //| ,Group 4 Intensity,Group Profile (Ratio),Avg. Mass,Description
  val header2:String = handler.getLines().next()  //> header2  : String = TCC 700928 / UPEC) GN=fumA PE=3 SV=2
  println (">"+header2)                           //> >TCC 700928 / UPEC) GN=fumA PE=3 SV=2

  val header3:String = handler.getLines().next()  //> header3  : String = n SE11) GN=zapB PE=3 SV=1
  println (">"+header3)                           //> >n SE11) GN=zapB PE=3 SV=1

An idea what I do wrong ? Many thanks for helping No hurry : this is part of an attempt to use scala and I'll now go back to Python for doing the job !


Solution

  • If I understand you correctly the problem is that every time you call handler.getLines() you receive a new Iterator[String] object that by default points to the first line of the CSV file. You should try something like this:

    val lineIterator = Source.fromFile("proteins.csv").getLines() // Get the iterator object
    val firstLine = lineIterator.next()
    val secondLine = lineIterator.next()
    val thirdLine = lineIterator.next()
    

    Or this:

    val lines = Source.fromFile("proteins.csv").getLines().toIndexedSeq // Convert iterator to the list of lines
    val n = 2
    val nLine = lines(n)
    println(nLine)