algorithm scala performance recursion tail-recursion

Performance of the tail recursive functions

The variety of books, articles, blog posts suggests that rewriting recursive function into tail recursive function makes it faster. No doubts it is faster for trivial cases like generating Fibonacci numbers or calculating factorial. In such cases there is a typical approach to rewrite - by using "helper function" and additional parameter for intermediate results.

TAIL RECURSION is the great description of the differences between tail recursive and not tail recursive functions and the possible way how to turn the recursive function into a tail recursive one. What is important for such rewriting - the number of function calls is the same (before/after rewriting), the difference comes from the way how those calls are optimized for tail recursion.

Nevertheless, it is not always possible to convert the function into tail recursive one with such an easy trick. I would categorize such cases as below

Function still can be rewritten into tail recursive but that might require additional data structures and more substantial changes in the implementation
Function cannot be rewritten into tail recursive with any means but recursion still can be avoided by using loops and imitating stack (I'm not 100% sure that tail recursion is impossible in some cases and I cannot describe how identify such cases, so if there is any academical research on this subject - the link would be highly appreciated)

Now let me consider specific example when function can be rewritten into tail recursive by using additional structures and changing the way algorithm works.

Sample task: Print all sequences of length n containing 1 and 0 and which do not have adjacent 1s.

Obvious implementation which comes to mind first is below (on each step, if current value is 0 then we generate two sequences with length n-1 otherwise we generate only sequence with length n-1 which starts from 0)

  def gen001(lvl: Int, result: List[Int]):Unit = {
    //println("gen001")
    if (lvl > 0) {
      if (result.headOption.getOrElse(0) == 0) {
        gen001(lvl - 1, 0 :: result)
        gen001(lvl - 1, 1 :: result)
      } else gen001(lvl - 1, 0 :: result)
    } else {
      println(result.mkString(""))
    }
  }

  gen001(5, List())

It's not that straightforward to avoid two function calls when current element is 0 but that can be done if we generate children for each value in the intermediate sequences starting from the sequence '01' on the level 1. After having hierarchy of auxiliary sequences for level 1..n we can reconstruct the result (printResult) starting from leaf nodes (or sequence from the last iteration in other words).

  @tailrec
  def g001(lvl: Int, current: List[(Int, Int)], result: List[List[(Int, Int)]]):List[List[(Int, Int)]] = {
    //println("g001")
    if (lvl > 1) {
      val tmp = current.map(_._1).zipWithIndex
      val next = tmp.flatMap(x => x._1 match {case 0 => List((0, x._2), (1, x._2)) case 1 => List((0, x._2))})
      g001(lvl - 1, next, next :: result)
    } else result
  }

  def printResult(p: List[List[(Int, Int)]]) = {
    p.head.zipWithIndex.foreach(x => 
      println(p.scanLeft((-1, x._2))((r1, r2) => (r2(r1._2)._1, r2(r1._2)._2)).tail.map(_._1).mkString("")))
  }

  val r = g001(5, List(0,1).zipWithIndex, List(List(0,1).zipWithIndex))

  println(r)

  printResult(r)

Output

List(List((0,0), (1,0), (0,1), (0,2), (1,2), (0,3), (1,3), (0,4), (0,5), (1,5), (0,6), (0,7), (1,7)), List((0,0), (1,0), (0,1), (0,2), (1,2), (0,3), (1,3), (0,4)), List((0,0), (1,0), (0,1), (0,2), (1,2)), List((0,0), (1,0), (0,1)), List((0,0), (1,1)))
00000
10000
01000
00100
10100
00010
10010
01010
00001
10001
01001
00101
10101

So now, if we compare two approaches, the first one requires much more recursive calls however on the other hand it's much more efficient in terms of memory because no additional data structures of intermediate results are required.

Finally, the questions are

Is there a class of recursive functions which cannot be implemented as tail recursive? If so how to identify them?
Is there a class of recursive functions such as their tail recursive implementation cannot be as efficient as non tail recursive one (for example in terms of memory usage). If so how to identify them. (Function from above example seems to be in this category)

Links to academical research papers are highly appreciated.

PS. Thera are a number of related questions already asked and below links may be quite useful

Rewrite linear recursive function as tail-recursive function

https://cs.stackexchange.com/questions/56867/why-are-loops-faster-than-recursion

UPDATE

Quick performance test

Tail recursive function can be simplified to store only auxiliary sequence on each step. That would be enough to print result. Let me take out of the picture function which prints result and provide just modified version of the tail recursive approach.

  def time[R](block: => R): R = {
    val t0 = System.nanoTime()
    val result = block    // call-by-name
    val t1 = System.nanoTime()
    println("Elapsed time: " + (t1 - t0)/1e9 + "s")
    result
  }

  @tailrec
  def gg001(lvl: Int, current: List[Int], result: List[List[Int]]):List[List[Int]] = {
    //println("g001")
    if (lvl > 1) {
      val next = current.flatMap(x => x match {case 0 => List(0, 1) case 1 => List(0)})
      gg001(lvl - 1, next, next :: result)
    } else result
  }

  time{gen001(30, List())}
  time{gg001(30, List(0,1), List(List(0,1)))}

  time{gen001(31, List())}
  time{gg001(31, List(0,1), List(List(0,1)))}

  time{gen001(32, List())}
  time{gg001(32, List(0,1), List(List(0,1)))}

Output

Elapsed time: 2.2105142s
Elapsed time: 1.2582993s
Elapsed time: 3.7674929s
Elapsed time: 2.4024759s
Elapsed time: 6.4951573s
Elapsed time: 8.6575108s

For some N tail recursive approach starts taking more time than the original one and if we keep increasing N further it will start failing with java.lang.OutOfMemoryError: GC overhead limit exceeded

Which makes me think that overhead to manage auxiliary data structures for tail recursive approach outweighs performance gains due to less number of recursive calls as well as their optimization.

I might have chosen not the most optimal implementation and/or data structures and also it may be due to language specific challenges but tail recursive approach does not look as absolute best solution (in terms of execution time/resources) even for this specific task.

Solution

Let me try to answer my own question.

Recursive function can be easily rewritten into a tail recursive one if it is never called more than once from the function body. Alternatively, any function can be rewritten into a loop and loop can be rewritten into tail recursive function but this would require using some auxiliary data structures [at least for] for stack. Such implementation can be slightly faster than the original approach (I believe this difference may vary from one language to another) but on the other hand it will be more cumbersome and less maintainable.

So practically, tail recursive implementation makes sense when it does not require efforts to implement stack and the only additional hassle is storing intermediate results.