Search code examples
scalaqueuecomplexity-theoryimmutabilityamortized-analysis

Is amortized time complexity analysis broken for immutable colections?


Update: this question is how can we use amortized analysis for immutable collections? Scala's immutable queue is just an example. How this immutable queue is implemented is clearly visible from sources. And as it was pointed in answers Scala's sources does not mention amortized time for it al all. But guides and internet podcasts do. And as I saw C# has the similar immutable queue with similar statements about amortized time for it.

The amortized analysis was invented originally for mutable collections. How we can apply it to a Scala's mutable Queue is clear. But how can we apply it to this sequence of operations for example?

val q0 = collection.immutable.Queue[Int]()
val q1 = q0.enqueue(1)
val h1 = q1.head
val q2 = q1.enqueue(2)
val h2 = q2.head
val (d2, q3) = q2.dequeue()
val (d1, q4) = q3.dequeue()

We have different immutable queues in sequence q0-q4. May we consider them as one single queue or not? How can we use O(1) enqueue operations to amortize both heavy head and the first dequeue? What method of amortized analysis can we use for this sequence? I don't know. I can not find anything in textbooks.

Final answer:

(Thanks to all who answered my question!)

In short: Yes, and no!

"No" because a data structure may be used as immutable but not persistent. The data structure is ephemeral if making a change we forget (or destroy) all old versions of the data structure. Mutable data structures is an example. Dequeuing of immutable queue with two strict lists can be called "amortized O(1)" in such ephemeral contexts. But full persistence with forking of the immutable data structure history is desirable for many algorithms. For such algorithms the expensive O(N) operations of the immutable queue with two strict lists are not amortized O(1) operations. So the guide authors should add an asterisk and print by 6pt font in footnote: * for specially selected sequences of operations.

In answers I was given an excellent reference: Amortization, Lazy Evaluation, and Persistence: Lists with Catenation via Lazy Linking:

We first review the basic concepts of lazy evaluation, amortization, and persistence. We next discuss why the traditional approach to amortization breaks down in a persistent setting. Then we outline our approach to amortization, which can often avoid such problems through judicious use of lazy evaluation.

We can create fully persistent immutable queue with amortized O(1) operations. It must be specially designed and use lazy evaluation. Without such framework with lazy evaluation of the structure parts and memorization of results we can not apply amortized analysis to fully persistent immutable data structures. (Also it is possible to create a double-ended queue with all operations having worst-case constant time and without lazy evaluation but my question was about amortized analysis).

Original question:

According to an original definition the amortized time complexity is a worst case averaged over sequence complexity for allowed sequences of operations: "we average the running time per operation over a (worst-case) sequence of operations" https://courses.cs.duke.edu/fall11/cps234/reading/Tarjan85_AmortizedComplexity.pdf See textbooks also ("Introduction To Algorithms" by Cormen et al. for example)

Scala's Collection Library guide states two collection.immutable.Queue methods (head and tail) have amortized O(1) time complexity: https://docs.scala-lang.org/overviews/collections-2.13/performance-characteristics.html This table does not mention complexities of enqueue and dequeue operations but another unofficial guide states O(1) time complexity for enqueue and amortized O(1) time complexity for dequeue. https://www.waitingforcode.com/scala-collections/collections-complexity-scala-immutable-collections/read

But what that statements for the amortized time complexity really mean? Intuitively they allow us to make predictions for algorithms with the data structure used such as any allowed by the data structure itself sequence of N amortized O(1) operations have no worse than O(N) complexity for the sequence. Unfortunately this intuitive meaning is clearly broken for immutable collections. For example, next function does have time complexity O(n^2) for 2n amortized O(1) (according to the guides) operations:

def quadraticInTime(n: Int) = {
  var q = collection.immutable.Queue[Int]()
  for (i <- 1 to n) q = q.enqueue(i)
  List.fill(n)(q.head)
}

val tryIt = quadraticInTime(100000)

The second parameter of List.fill method is a by name parameter and is evaluated n times in sequence. We can also use q.dequeue._1 instead of q.head of course with the same result.

Also we can read in "Programming in Scala" by M. Odersky et al.: "Assuming that head, tail, and enqueue operations appear with about the same frequency, the amortized complexity is hence constant for each operation. So functional queues are asymptotically just as efficient as mutable ones." This contradicts to the worst case amortized complexity property from textbooks and wrong for quadraticInTime method above.

But if a data structure has O(1) time complexity of cloning we can break amortized time analysis assumptions for it just by executing N worst case "heavy" operations on N the data structure copies in sequence. And generality speaking any immutable collection have O(1) time complexity of cloning.

Question: is there a good formal definition of amortized time complexity for operations on immutable structures? The definition clearly must farther limit allowed sequences of operations to be useful.


Solution

  • Chris Okasaki described how to solve this problem with lazy evaluation in Amortization, Lazy Evaluation, and Persistence: Lists with Catenation via Lazy Linking from 1995. The main idea is that you can guarantee that some action is done only once by hiding it in a thunk and letting the language runtime manage evaluating it exactly once.