Search code examples
mysqlsqlrenameidl

MySQL Remove characters from column headers


All my column headers in a MySQL database are prefixed with a number, 1_X, 2_X, etc... which makes bringing the data into IDL impossible using just a basic select statement to bring in the entire table. I'm not sure but I see two possible ways:

1) Bring in the table with column name aliases. Can I use TRIM or SUBSTRING_INDEX to remove/replace the first two characters?

2) Create a routine that uses the information schema to to recursively go through and delete the first two characters of the column headers and create a new table with those headers and copy the data in.

If there weren't so many different tables (all with 1_X, 2_X, etc...) there'd be no problem manually selecting 1_X AS X but that's not feasible. It would be great to be able to use TRIM/SUBSTRING on column headers in the select statement.

Thanks.


Solution

  • It's not possible to use functions in a SQL statement to alter the identifier assigned to a column being returned. The SQL way of specifying the identifier for the column in a resultset is to use the expr AS alias approach.


    Rather than trim off the leading digit characters, you could prepend the identifiers with another valid character. (Trimming off leading characters seems like it would potentially lead to another problem, duplicate and/or zero length column names.)

    You could just use a SQL statement to generate the SELECT list for you.

    (NOTE: the GROUP_CONCAT function is limited by some system/session variables: group_concat_max_len and max_allowed_packet, it's easy enough to adjust these higher, though changing global max_allowed_packet may require MySQL to be restarted.)

    To get it back the SELECT list on all one line (assuming you won't overflow the GROUP_CONCAT limits) something like:

    SELECT c.table_schema
         , c.table_name
         , GROUP_CONCAT(
              CONCAT('t.`',c.column_name,'` AS `x',c.column_name,'`')
              ORDER BY c.ordinal_position
           ) AS select_list_expr
      FROM information_schema.columns c
      FROM information_schema.columns c
     WHERE c.table_schema = 'mydatabase'
     GROUP BY c.table_schema, c.table_name
    

    Or, you could even get back a whole SELECT statement, if you wrapped that GROUP_CONCAT expression (which produces the select list) in another CONCAT

    Something like this:

    SELECT CONCAT('SELECT '
              , GROUP_CONCAT(
                   <select_list_expr>
                )
              , ' FROM `',c.table_schema,'`.`',c.table_name,'` t;'
           ) AS stmt
      FROM information_schema.columns c
     WHERE c.table_schema = 'mydatabase'
     GROUP BY c.table_schema, c.table_name
    

    You could use a more clever expression for <select_list_expr>, to check for leading "digit" characters, and assign an alias to just those columns that need it, and leave the other columns unchanged, though that again introduces the potential for returning duplicate column names.

    That is, if you already have columns named '1_X' and 'x1_X' in the same table. But a carefully chosen leading character may avoid that problem...

    The <select_list_expr> could be more clever by doing a conditional test for leading digit character, something like this:

    SELECT CONCAT('SELECT '
              , GROUP_CONCAT(
                   CASE
                   WHEN c.column_name REGEXP '^[[:digit:]]'
                   THEN CONCAT('t.`',c.column_name,'` AS `x',c.column_name,'`')
                   ELSE CONCAT('t.`',c.column_name,'`')
                   END
               )
              , ' FROM `',c.table_schema,'`.`',c.table_name,'` t;'
           ) AS stmt
      FROM information_schema.columns c
     WHERE c.table_schema = 'mydatabase'
     GROUP BY c.table_schema, c.table_name
    

    Again, there's a potential for generation "duplicate" column names with this approach. The conditional test "c.column_name REGEXP" could be extended to check for other "invalid" leading characters as well.


    As a side note, at some point, someone thought it a "good idea" to name columns with leading digit characters. Just because something is allowed doesn't mean it's a good idea.


    Then again, maybe all that rigamarole isn't necessary, and just wrapping the column names in backticks would be sufficient for your application.