Search code examples
performancelanguage-agnosticlanguage-featuresinterpreted-language

Interpreted languages - leveraging the compiled language behind the interpreter


If there are any language designers out there (or people simply in the know), I'm curious about the methodology behind creating standard libraries for interpreted languages. Specifically, what seems to be the best approach? Defining standard functions/methods in the interpreted language, or performing the processing of those calls in the compiled language in which the interpreter is written?

What got me to thinking about this was the SO question about a stripslashes()-like function in Python. My first thought was "why not define your own and just call it when you need it", but it raised the question: is it preferable, for such a function, to let the interpreted language handle that overhead, or would it be better to write an extension and leverage the compiled language behind the interpreter?


Solution

  • The line between "interpreted" and "compiled" languages is really fuzzy these days. For example, the first thing Python does when it sees source code is compile it into a bytecode representation, essentially the same as what Java does when compiling class files. This is what *.pyc files contain. Then, the python runtime executes the bytecode without referring to the original source. Traditionally, a purely interpreted language would refer to the source code continuously when executing the program.

    When building a language, it is a good approach to build a solid foundation on which you can implement the higher level functions. If you've got a solid, fast string handling system, then the language designer can (and should) implement something like stripslashes() outside the base runtime. This is done for at least a few reasons:

    • The language designer can show that the language is flexible enough to handle that kind of task.
    • The language designer actually writes real code in the language, which has tests and therefore shows that the foundation is solid.
    • Other people can more easily read, borrow, and even change the higher level function without having to be able to build or even understand the language core.

    Just because a language like Python compiles to bytecode and executes that doesn't mean it is slow. There's no reason why somebody couldn't write a Just-In-Time (JIT) compiler for Python, along the lines of what Java and .NET already do, to further increase the performance. In fact, IronPython compiles Python directly to .NET bytecode, which is then run using the .NET system including the JIT.

    To answer your question directly, the only time a language designer would implement a function in the language behind the runtime (eg. C in the case of Python) would be to maximise the performance of that function. This is why modules such as the regular expression parser are written in C rather than native Python. On the other hand, a module like getopt.py is implemented in pure Python because it can all be done there and there's no benefit to using the corresponding C library.