mysql user-defined-functions sql-injection dynamicquery

Use MySQL Query Execution Plan for Detecting SQL Injections

I have a project that requires we allow users to create custom columns, enter custom values, and use these custom values to execute user defined functions.

We have exhausted all implementation strategies we can think of (executing formulas on the front end, in isolated execution environments, etc.).

Short of writing our own interpreter, the only implementation we could find that meets the performance, functionality, and scalability requirements is to execute these functions directly within MySQL. So basically taking the expressions that have been entered by the user, and dynamically rolling up a query that computes results server side in MySQL.

This obviously opens a can of worms security wise.

Quick aside: I expect to get the "you shouldn't do it that way" response. Trust me, I hate that this is the best solution we can find. The resources online describing similar problems is remarkably scarce, so if there are any suggestions for where to find information on analogous problems/solutions/implementations, I would greatly appreciate it.

With that said, assuming that we don't have alternatives, my question is: How do we go about doing this safely?

We have a few current safeguards set up:

Executing the user defined expressions against a tightly controlled subquery that limits the "inner context" that the dynamic portion of the query can pull from.
Blacklisting certain phrases the should never be used (SELECT, INSERT, UNION, etc.). This introduces issues, because a user should be able to enter something like: CASE WHEN {{var}} = "union pacific railroad" THEN... but that is a tradeoff we are willing to make.
Limiting the access of the MySQL connection making the query to only have access to the tables/functionality needed for the feature.

This gets us pretty far. But I'm still not comfortable with it. One additional option that I couldn't find any info online about was using the query execution plan as a means of detecting if the query is going outside of its bounds.

So prior to actually executing the query/getting the results, you would wrap it within an EXPLAIN statement to see what the dynamic query was doing. From the results of the EXPLAIN query, you should able to detect any operations (subqueries, key references, UNIONs, etc.) that fall outside of the bounds of what the query is allowed to do.

Is this a useful validation method? It seems to me that this would be a powerful tool for protecting against a suite of SQL injections, but I couldn't seem to find any information online.

Thanks in advance!

(from Comment)

Some Examples showing the actual autogenerated queries being used. There are both visual and list examples showing the query execution plan for both malicious and valid custom functions.

Solution

GRANT only SELECT on the table(s) that they are allowed to manipulate. This allows arbitrarily complex SELECT queries to be run. (The one flaw: Such queries may run for a long time and/or take a lot of resources. MariaDB has more facilities for preventing run-away selects.)
Provide limited "write" access via Stored Routines with expanded privileges, but do not pass arbitrary values into them. See SQL SECURITY: DEFINER has the privileges of the person creating the routine. (As opposed to INVOKER is limited to SELECT on the tables mentioned above.)

Another technique that may or may not be useful is creating VIEWs with select privileges. This, for example, can let the user see most information about employees while hiding the salaries.

Related to that is the ability to GRANT different permissions on different columns, even in the same table.

(I have implemented a similar web app, and released it to everyone in the company. And I could 'sleep at night'.)

I don't see subqueries and Unions as issues. I don't see the utility of EXPLAIN other than to provide more info in case the user is a programmer trying out queries.

EXPLAIN can help in discovering long-running queries, but it is imperfect. Ditto for LIMIT.

More

I think "UDF" is either "normalization" or "EAV"; it is hard to tell which. Please provide SHOW CREATE TABLE.

This is inefficient because it builds a temp table before removing the 'NULL' items:

FROM ( SELECT ...
        FROM  ...
        LEFT JOIN ...
     ) AS context
WHERE ... IS NULL

This is better because it can do the filtering sooner:

FROM ( SELECT ...
        FROM  ...
        LEFT JOIN ...
        WHERE ... IS NULL
     ) AS context