Search code examples
data-structuresscalaimmutabilitymutable

how to read immutable data structures from file in scala


I have a data structure made of Jobs each containing a set of Tasks. Both Job and Task data are defined in files like these:

jobs.txt:
JA
JB
JC

tasks.txt:
JB  T2
JA  T1
JC  T1
JA  T3
JA  T2
JB  T1 

The process of creating objects is the following:
- read each job, create it and store it by id
- read task, retrieve job by id, create task, store task in the job

Once the files are read this data structure is never modified. So I would like that tasks within jobs would be stored in an immutable set. But I don't know how to do it in an efficient way. (Note: the immutable map storing jobs may be left immutable)

Here is a simplified version of the code:

class Task(val id: String) 

class Job(val id: String) {
    val tasks = collection.mutable.Set[Task]() // This sholud be immutable
}

val jobs = collection.mutable.Map[String, Job]() // This is ok to be mutable

// read jobs
for (line <- io.Source.fromFile("jobs.txt").getLines) { 
    val job = new Job(line.trim)
    jobs += (job.id -> job)
}

// read tasks
for (line <- io.Source.fromFile("tasks.txt").getLines) {
    val tokens = line.split("\t")
    val job = jobs(tokens(0).trim)
    val task = new Task(job.id + "." + tokens(1).trim)
    job.tasks += task
}

Thanks in advance for every suggestion!


Solution

  • The most efficient way to do this would be to read everything into mutable structures and then convert to immutable ones at the end, but this might require a lot of redundant coding for classes with a lot of fields. So instead, consider using the same pattern that the underlying collection uses: a job with a new task is a new job.

    Here's an example that doesn't even bother reading the jobs list--it infers it from the task list. (This is an example that works under 2.7.x; recent versions of 2.8 use "Source.fromPath" instead of "Source.fromFile".)

    object Example {
      class Task(val id: String) {
        override def toString = id
      }
    
      class Job(val id: String, val tasks: Set[Task]) {
        def this(id0: String, old: Option[Job], taskID: String) = {
          this(id0 , old.getOrElse(EmptyJob).tasks + new Task(taskID))
        }
        override def toString = id+" does "+tasks.toString
      }
      object EmptyJob extends Job("",Set.empty[Task]) { }
    
      def read(fname: String):Map[String,Job] = {
        val map = new scala.collection.mutable.HashMap[String,Job]()
        scala.io.Source.fromFile(fname).getLines.foreach(line => {
          line.split("\t") match {
            case Array(j,t) => {
              val jobID = j.trim
              val taskID = t.trim
              map += (jobID -> new Job(jobID,map.get(jobID),taskID))
            }
            case _ => /* Handle error? */
          }
        })
        new scala.collection.immutable.HashMap() ++ map
      }
    }
    
    scala> Example.read("tasks.txt")
    res0: Map[String,Example.Job] = Map(JA -> JA does Set(T1, T3, T2), JB -> JB does Set(T2, T1), JC -> JC does Set(T1))
    

    An alternate approach would read the job list (creating jobs as new Job(jobID,Set.empty[Task])), and then handle the error condition of when the task list contained an entry that wasn't in the job list. (You would still need to update the job list map every time you read in a new task.)