Search code examples
jdbcodbcapache-calcite

Standard way to use ODBC or JDBC to get column metadata of Select statement?


Is there a cross-vendor-compatible way to get the column metadata for an arbitrary SELECT statement in ODBC or JDBC without needing to execute the query?

I am building an analytics tool as a vendor. I cannot assume that the user can create temp tables. Would throwing a TOP or LIMIT on the statement work? Would that guarantee quick return time?

I'm also open to Apache Calcite or other SQL parsing libraries. I don't know if any of these libraries are actually aware of the argument and return types of functions.

I suspect the answer is "no" but which vendors would be easier to support upfront?


Solution

  • Adding on to @Mark's JDBC answer with insights from the ODBC side:

    ODBC does have a low-level mechanism named SQLDescribeCol for retrieving column information (when possible). However, it is something that is normally used by those building ODBC interface layers (e.g., pyodbc) or others writing directly to the ODBC API from a C/C++ application.

    I don't recall ever seeing an ODBC interface layer that implements something similar to JDBC's PreparedStatement#getMetaData. For example, System.Data.Odbc.OdbcCommand in .NET has an explit .Prepare() method but it does not have a method for retrieving the result set's metadata; you need an OdbcDataReader object for that and you get one by calling .ExecuteReader().

    So AFAIK the most "cross-vendor-compatible" way of getting the result set metadata is to wrap the actual query in an outer query that returns no rows, e.g. for ODBC in C# ...

    string actual_query = "SELECT many_columns FROM many_tables WHERE many_conditions";
    string metadata_query = $"SELECT * FROM ({actual_query}) AS subquery WHERE 1=0";
    var cmd = new OdbcCommand(metadata_query, conn);
    OdbcDataReader rdr = cmd.ExecuteReader();
    DataTable dt = rdr.GetSchemaTable();
    

    ... and hope the query optimizer is clever enough to know that it doesn't have to process the "many_conditions" from the subquery because the outer query will never return any rows.