Search code examples
pythonsecurityxsstemplate-engine

Is it safe to use python str.format method with user-submitted templates in server-side?


I am working on project where users must be able to submit templates containing placeholders to be later rendered to generate dynamic content.

For example, a user might submit a template like:

"${item.price} - {item.description} / {item.release_date}"

that would be after formatted with the real values.

Using a template engine (as Django or Jinja) for this purpose would require a lot of validation and sanitizing to prevent SSTI and XSS, so I was wondering if I could use the python str.format method to create a more limited and safer alternative, since this method cannot execute python code directly (I believe).

My question is:

Is using string.format safe enough when dealing with user-submitted templates, or it would still be vulnerable to injection attacks?
If it is not, is there any alternative to implement a "safe template rendering" in python?


Solution

  • No, it is not safe in general to use str.format with user-provided format strings.

    Format strings are capable to executing a limited form of Python code. This limited code is just powerful enough to pose significant denial of service (DOS) and data breach risks.

    The main factors that make using untrusted format strings risky are:

    1. There's no telling how large the formatted string will be. Even short format strings can lead to massive expansions, posing DOS risks.
    2. Python's string formatting syntax is actually quite rich, and notably supports arbitrary indexing and attribute access on the format arguments. This means that you have to worry not only about the objects that you are passing to str.format, but also about every object that is indirectly accessible from those objects via indexing and attribute lookups. It turns out that the space of indirectly accessible objects is very large.
    3. Performing indexing or attribute access operations can trigger a larger number of different special methods. These triggered methods can have unintended side effects or raise unexpected exceptions.

    Data Breach Risks

    Off the top of my head, I came up with this toy example of how str.format with a carefully constructed format string can easily leak sensitive information from your server. This attack only relies on the attacker knowing (or guessing) the of types of the arguments being supplied to str.format.

    Running this code will print all environment variables defined on the server:

    import os
    
    class Foo:
        def foo(self):
            pass
    
    user_format = "{f.foo.__globals__[os].environ}"
    print(user_format.format(f=Foo()))
    

    Specific to Django, this format string will print your entire settings module, complete with your DB passwords, signing keys, and all sorts of other bits of exploitable information:

    user_format = "{f.foo.__globals__[sys].modules[myapp.settings].__dict__}"
    

    Denial of Service Risks

    Simple format strings like "{:10000000000}" can lead to massive memory consumption. This can grind things to a halt due to constant page swapping, or even crash the server process with a MemoryError exception or OOM kill.

    Additionally, due to the sheer number of different objects that a format string can interact with, and the variety of special methods that can be called on those objects, there's no telling what sort of other weird and unexpected exceptions could be raised or what resource intensive operations that may be performed. You can imagine a scenario where an ORM manager class has a property that triggers a database transaction when accessed. Such a property could be repeatedly accessed in the format string to stall the server with database interactions.