I've been writing a lot of programs in the lambda calculus recently and I wish I could run some of them in realtime. Yet, as much as the trending functional paradigm is based on the lambda calculus and the rule of B-reductions, I couldn't find a single evaluator that isn't a toy, not meant for efficiency. Functional languages are supposed to be fast, but those I know don't actually provide access to normal forms (see Haskell's lazy evaluator, Scheme's closures and so on), so don't work as LC evaluators.
That makes me wonder: is it just impossible to evaluate lambda calculus terms efficiently, is it just a historical accident / lack of interest that nobody decided to create a fast evaluator for it, or am I just missing something?
At the current state of knowledge, the best way of evaluating lambda terms is the so called optimal graph reduction technique. The technique was introduced in 1989 by J.Lamping in his POPL article "An algorithm for optimal lambda calculus reduction", and then revised and improved by several authors. You may find a survey in my book with S.Guerrini "The optimal implementation of functional programming languages", Cambridge Tracts in Theoretical Computer Science n.45, 1998.
The term "optimal" refers to the management of sharing. In lambda calculus you have a lot of duplication and the efficiency of the reduction is crucially based on avoding replicating work. In a first order setting, directed acyclic graphs (dags) are enough for managing sharing, but as soon as you enter in a higher order setting you need more complex graph structures comprising sharing and unsharing.
On pure lambda terms, optimal graph reduction is faster than all other known reduction techniques (environment machine, supercombinators, or whatever). Some benchmarks are given in the above book (see pag.296-301), where we proved that our implementation outperformed both caml-light (the precursor of ocaml) and Haskell (really slow). So, if you hear people saying that it has never been proved that optimal reduction is faster than other techniques, it is not true: it was proved both in theory and experimentally.
The reason functional languages are not yet implemented in this way is that in the practice of functional programming you very seldom use functionals with a really high rank, and when you do they are frequently linear. As soon as you raise the rank, the intrinsic complexity of lambda terms can become dangerously high. Having a technique that allows you to reduce a term in time O(2^n) instead of O(2^(2^n)) does not make all that difference, in practice: both computations are just unfeasible.
I recently wrote a short article trying to explain these issues: "About the efficient reduction of lambda terms". In the same article, I also discuss the possibility to have superoptimal reduction techniques.