Search code examples
scalasequences

Filter out common entries from two sequences, returning two lists with unique entries from the originals


I want to filter the common elements out of two lists, and get left with two lists containing the remainders.

I have two sequences in scala:

val firstSeq = Seq(1,2,3,4)
val secondSeq = Seq(1,3,5,7)

And what I want to do is filter all of the common elements, which would mean I end up with:

filteredFirstSeq = Seq(2,4)
filteredSecondSeq = Seq(5,7)

So, there's an easy way to get here in scala:

val filteredFirstSeq = firstSeq.filterNot(firstEntry => secondSeq.contains(firstEntry))
val filteredSecondSeq = secondSeq.filterNot(secondEntry => firstSeq.contains(secondEntry))

But! This means I have to run through the entire first list, and matches, and the entire second list, and matches, which takes a long time when the lists are large, and the entries are more complicated than integers!

I'd much prefer to only have to cycle through everything once, but the only way I can think to do so is to have mutable lists, and remove a value from both whenever we find a match. That seems a bit nasty, though. I'm certain that there must be a trivial answer to this that I'm missing.

Thanks for any suggestions!


Solution

  • This example assumes that the each of the lists contains no duplicates, if that is not the case then the logic inside of the fold will have to change slightly.

    val firstSeq = Seq(1,2,3,4)
    val secondSeq = Seq(1,3,5,7)
    
    // Put everything into a list, keeping track of where things came from
    val both: Seq[(Int, Int)] = firstSeq.map(x => (x, 1)) ++ secondSeq.map(x => (x, 2))
    
    // Reduce the list into a single map, where the keys are the numbers, and the value is the originating seq.  Anytime we try to insert a value that already is in the map, we remove the value instead, since that will mean the value was in each sequence.
    val map: Map[Int, Int] = both.foldLeft(Map.empty[Int, Int]) { (map, tuple) =>
      val (value, seqNumber) = tuple
      if (map.contains(value)) {
        map - value
      } else {
        map + (value -> seqNumber)
      }
    }
    
    // Now partition the values back into their original lists
    val (firstSeqFiltered, secondSeqFiltered) = map.partition(_._2 == 1)
    println(firstSeqFiltered.keys)
    println(secondSeqFiltered.keys)
    

    Output:

    Set(2, 4)
    Set(5, 7)