pythonparsingjsxinterpretertemplate-engine

How to embed a language like JSX into a python script?


At our company we love to write django driven applications and we also love to use react. Recently we thought about writing a component based templating engine for python where templates can be written as react-like components using JSX.

Ideally it should be possible to embed JSX into the python code so that you can write components like this:

In header.pyx:

import PyReact
from my_awsome_project.components import Logo, Link


def Header(context):
    page_title = context.get('page_title')
    links = context.get('links')

    return (
        <div>
            <Logo /> 
            {page_title}
            <ul>
                {[<Link href={link.url}>{link.title}</Link> for link in links]}
            </ul>
        </div>
    )

This would of course require to transpile the file first to get valid python code. It would transpile to something somewhat similar as that:

import PyReact
from my_awsome_project.components import Logo, Link


def Header(context):
    page_title = context.get('page_title')
    links = context.get('links')

    return (
        PyReact.createComponent('div', context, [
            PyReact.createComponent(Logo),
            page_title,
            PyReact.createComponent('ul', context, [
                 [
                      PyReact.createComponent(Link, {'href': link.url}, link.title)
                      for link in links
                 ]
            ]),
        ])
    )

The question is: How would I approach writing such a transpiler?

Also we thought about instead of embedding JSX directly into the python code directly we could return a string containing the JSX that gets parsed independently. Would that be a better/easier approach?


Solution

  • I think this is basically too broad a question for SO, and any answer will be skating on the edge of the SO guidelines about opinions. You're essentially asking for design advice about a complicated problem, and SO isn't really intended for that purpose.

    Still, it's an interesting question. I'll try to address the issues without venturing too deep into opinionated design (since I do have opinions on the subject).

    1. Transpiling is practical, at least theoretically, and if you can pull it off, it will give you reasonable performance.

    2. Repeatedly reparsing template strings strikes me as inefficient and complicated; the complications have to do with evaluating embedded Python code, which you will want to do inside of the scope in which the string literal is defined, which is probably not the scope in which it is being parsed.

    3. JSX-style lexical analysis and parsing is not particularly complicated, but your hypothetical transpiler would need to also understand Python lexical and syntactic analysis. Python's standard library includes modules for lexing and parsing Python but afaik they are not extensible, which might make it hard to leverage them for use with an embedded language. You could either write your own lexer and parser, possibly using your choice of code generators, or you could base your lexer and parser on some open source Python implementation. In both cases, your maintainability challenge will be keeping your custom code synchronized with future Python versions.

    4. The main issue in embedding pseudo-HTML into anither language is detecting when a < is a comparison operator and when it starts a template. The simplest solution is to allow the template only when the < is lexically analysed as a complete token (so that <= is always an operator), is followed by an identifier, and is encountered in a syntactic environment in which an expression is expected.

    5. The last requirement above is to ensure that 3 < count (for example) does not fool the transpiler into thinking that it's about to see a <count...> component. I'm pretty sure that in Python you can use a simple lexical rule based on the preceding token, but a complete syntax analysis would be necessary to verify that

    6. Once you start a template, it will continue until you reach the matching close tag; that's very simple if tags are required to match. But it's better suited to top-down parsing than bottom-up parsing, because end-tag matching is context-sensitive. That's easy to do if you have close cooperation between lexical analysis and syntactic analysis, but that cooperation is sometimes frowned upon :-)

    7. Since Python code embedded in a template can itself contain an embedded template which could in turn embed more Python code, etc , your analysis will need to be recursive. The expected recursion depth is not very big, so there's no problem with the recursion per se, but many parser generators do not handle this kind of recursion elegantly. I'd suggest using (or implementing) a "push-parser" and a lexer framework separated from the buffer handler, so that you can easily change the scanner in the middle of a buffer.

    8. Buffer handling can be quite simple; the minimum requirement is just a string and an index into that string. If you isolate implementation details inside the buffer handler, you should be able to later change to a different implementation, for example one which doesn't require having the entire input available before starting the parse. You probably don't actually need that feature, but it's always good to maintain independent components, just in case

    9. Another challenge for your transpiler will be integrating it with Python's module system. Pythonic integration might suggest that the transpilation should be performed when the module is imported. On the other hand, you might want to be able to distribute a pre-transpiled bundle which can be used without installing the transpiler, and without depending on a particular version of the transpiler. If you spend some time thinking this through, you might avoid later problems. (For example, the Ply issue which makes it impossible to bundle a Ply project into a single-file distribution system.)

    Hope that helps a bit.