Search code examples
sasimpalaproc-sql

Change the length of a variable in IMPALA SQL Select statement through SAS


I am connecting to an impala server to retrieve some data. However I want to change the length of the variable in the impala pass-through sql statement because otherwise it will retrieve the variable with a length of 32767 which is not optimized.

This is how I would do it in SAS:

proc sql;
    connect to impala (dsn="somedsn");
    create table want as
    select * 
    from connection to impala
        (select var1 length=50  
            from &disc_table.);
    disconnect from impala;
quit;

However, as IMPALA does not have the same sql statement structure, it does not work.

What is the correct statement to retrieve the var1 with a length=50 in IMPALA SQL statement ?


Solution

  • One option is to change the length in the SAS-side select:

    proc sql;
        connect to impala (dsn="somedsn");
        create table want as
        select var1 length=50 
        from connection to impala
            (select var1 
                from &disc_table.);
        disconnect from impala;
    quit;
    

    You could also use the DBMAX_TEXT option I believe in the connection step, though I'm not familiar with Impala's options specifically.


    Given the comments below, it looks like the problem is that Impala doesn't support CHAR/VARCHAR, and so you get 32767 length strings. Your solution might be to do both things Tom and I suggest, ie:

    proc sql;
        connect to impala (dsn="somedsn");
        create table want as
        select var1 length=50 
        from connection to impala
            (select cast(var1 as varchar(50)) as var1 
               from &disc_table.);
        disconnect from impala;
    quit;
    

    Which might mean you get less data transferred back AND you get a smaller column; or you might add DBMAX_TEXT here; or you can also do as the kb article suggests and additionally add a limit in the ODBC connection options.

    Another option is to use SAS DS2, though it looks like it still has the same issue ultimately (in that older versions of Impala don't support varchar/char). That document suggests upgrading to CDH 5.2 (which I think is a Cloudera implementation of Impala). I'm not sure if non-Cloudera versions support VARCHAR, though there are at least some mentions in the Impala change notes for version 3.4, specifically this issue which mentions adding it to Kudu tables.