Search code examples
scalacomparisonfilteringcase-class

How to filter a List with another List based on some conditions?


Let's say I have this code to figure out dups in a List based on a constructor parameter: (I ended up with this after parsing some text files which have duplicates.)

case class Line(ini: String, name:String, com:String)

val l0 = Line("X", "hello", "some text")
val l1 = Line("", "world", "some text")
val l2 = Line("X", "computer", "")
val l3 = Line("", "hello", "")
val l4 = Line("X", "world", "")
val l5 = Line("", "hello", "some stuff")

val lineList = List(l0,l1,l2,l3, l4, l5)

val dup = lineList.groupBy(_.name).collect { case (x, List(_,_,_*)) => x } // should yield List("hello", "world")

Now I know which one is a duplicate. But how can I filter the lineList again to filter out the dups based on some other rules?

In the end I want to have a List with no duplicates anymore but I also want to retain as much information from the properties ini and com as possible. That means I want to keep the duplicate that follows one of the following rules:

  • Lines with content in property ini and com have precedence over all others, meaning: Line("X", "hello", "some text") vs Line("", "hello", "some text") vs Line("", "hello", "") should give back the first

  • Lines with content in property com have precedence over ini, meaning: Line("", "hello", "") vs Line("", "hello", "some text") should give back the last one

  • Lines with content in property ini have precedence over lines with nothing in ini or com, meaning: Line("X", "hello", "") vs Line("", "hello", "") should give back the first

  • in case both duplicates have information in ini and com, I don't care which one is selected.

I wonder if that's not overly complicated and there might be another way to solve this. All I want to accomplish is a List that has no more dups while keeping that dup that had the most information on it. How would one solve this?


Solution

  • You can define a chooseBetterLine function that does the logic you need for any two lines with the same name (I hope I followed it correctly) - and then use reduce on the values:

    def chooseBetterLine(l1: Line, l2: Line): Line = {
      if (l1.ini.nonEmpty && l2.ini.isEmpty) l1
      else if (l1.com.nonEmpty && l2.com.isEmpty) l1
      else l2
    }
    
    val result: Iterable[Line] = lineList.groupBy(_.name).values.map(_.reduce(chooseBetterLine))