Search code examples
node.jsregexpostgresqlpg-promise

Determine when an SQL alias can be an open name


What would be the highest-performing implementation to determine if a string that represents an SQL alias needs to be wrapped in double-quotes?

Presently, in pg-promise I am wrapping every alias in double-quotes, to play it safe. I am looking to make the output SQL neater and shorter, if possible.

And I am divided which approach is the best -

  • to use a regular expression, somehow
  • to do a direct algorithm with strings
  • not to change it at all, if there are reasons for that

Basically, I am looking to improve function as.alias, if possible, not to wrap aliases into double quotes when it is not needed.


What have I tried so far...

I thought at first to do it only for the 99% of all cases - not to add double-quotes when your alias is the most typical one, just a simple word:

function skipQuotes(alias) {
    const m = alias.match(/[A-Z]+|[a-z]+/);
    return m && m[0] === alias;
}

This only checks it is a single word that uses either upper or lower case, but not the combination.


SOLUTION

Following the answer, I ended up with implementation that should cover 99% of all practical use cases, which is what I was trying to achieve:

const m = alias.match(/[a-z_][a-z0-9_$]*|[A-Z_][A-Z0-9_$]*/);
if (m && m[0] === alias) {
    // double quotes will be skipped
} else {
    // double quotes will be added
}

i.e. the surrounding double quotes are not added when the alias uses a simple syntax:

  • it is a same-case single word, without spaces
  • it can contain underscores, and can start with one
  • it can contain digits and $, but cannot start with those

Solution

  • Removing double quotes is admirable -- it definitely makes queries easier to read. The rules are pretty simple. A "valid" identifier consists of:

    • Letters (including diacritical marks), numbers, underscore, and dollar sign.
    • Starts with a letter (including diacriticals) or underscore.
    • Is not a reserved word.

    (I think I have this summarized correctly. The real rules are in the documentation.)

    The first two are readily implemented using regular expressions. The last probably wants a reference table for lookup (and the list varies by Postgres release -- although less than you might imagine).

    Otherwise, the identifier needs to be surrounded by escape characters. Postgres uses double quotes (which is ANSI standard).

    One reason you may want to do this is because Postgres converts identifiers to lower case for comparison. So, the following works fine:

    select xa, Xa, xA, "xa"
    from (select 1 as Xa) y
    

    However, this does not work:

    select Xa
    from (select 1 as "Xa") y
    

    Nor does:

    select "Xa"
    from (select 1 as Xa) y
    

    In fact, there is no way to get refer to "Xa" without using quotes (at least none that I can readily think of).

    Enforcing the discipline of exact matches can be a good thing or a bad thing. I find that one discipline too many: I admit to often ignoring case when writing "casual" code; it is just simpler to type without capitalization (or using double quotes). For more formal code, I try to be consistent.

    On the other hand, the rules do allow:

    select "Xa", "aX", ax
    from (select 1 as "Xa", 2 as "aX", 3 as AX) y
    

    (This returns 1, 2, 3.)

    This is a naming convention that I would be happy if it were not allowed.