Search code examples
scaladiffsequenceseq

Scala - finding first position in which two Seq differ


Scala comes with the nice corresponds method:

val a = scala.io.Source.fromFile("fileA").getLines().toSeq()
val b = scala.io.Source.fromFile("fileB").getLines().toSeq()

val areEqual = a.corresponds(b){_.equals(_)}

if(areEqual) ...

And I quite like the brevity of that.

Is there a similar method already defined that will also report to me the first position in which the two sequences differ?

I.e. is there a more idiomatic way to write something like this:

val result = ((seqA zip seqB).zipWithIndex).find{case ((a,b),i) => !a.equals(b)} match{
    case Some(((a,b),i)) => s"seqA and seqB differ in pos $i: $a <> $b"
    case _ => "no difference"
}

Because as you can see, that's a bloody pain in the neck to read. And it gets even worse if I want to use triplets instead of tuples of tuples:

val result = (((seqA zip seqB).zipWithIndex) map {case (t,i) => (t._1,t._2,i)}).find{case (a,b,i) => !a.equals(b)} match{
    case Some((a,b,i)) => s"seqA and seqB differ in pos $i: $a <> $b"
    case _ => "no difference"
}

I am aware of the diff method. Unfortunately, that one disregards the order of the elements.


Solution

  • You can use indexWhere (see ScalaDoc) as follows:

    (as zip bs).indexWhere{case (x,y) => x != y}
    

    Example:

    scala> val as = List(1,2,3,4)
    scala> val bs = List(1,2,4,4)
    
    scala> (as zip bs).indexWhere{case (x,y) => x != y}
    
    res0: Int = 2
    

    However, note that all solutions based on zip may report no differences if one Seq is longer than the other (zip truncates the longer Seq ) - this might or might not be what you need...

    Update: For Seqs of equal length, a different approach is as follows:

    as.indices.find(i => as(i) != bs(i))
    

    This is nice as it returns an Option[Int], so it returns None rather than a magical -1 if there is no difference between the Seqs.

    It behaves the same as the other solution if as is shorter than bs, but fails if as is longer (you could take the minimum length, of course).

    However, because it addresses both Seqs by index, it will only perform well for IndexedSeqs.

    Update 2: We can deal with different Seq lengths by using lift, so that we get an Option when retrieving elements by index:

    bs.indices.find(i => as.lift(i) != bs.lift(i))
    

    so if as = [1,2] and bs = [1,2,3], the first index by which they differ is 2 (because this element is missing in as). However, in this case we need to call indices on the longest Seq rather than the shortest - or explicitly check which is longest using max, e.g.

    (0 until (as.length max bs.length)).find(i => as.lift(i) != bs.lift(i))