Search code examples
bashsedimpala

impala-shell inserts odd control characters into the response to a non-interactive query


I run a query in impala to check that a condition is met in a table and store the number of rows in a bash variable:

UC=$(impala-shell -r -q "select count(1) from table where condition=1" -d $DB -i $HOST -B)

UC now holds the row count where condition is one, in this case, no rows meet this condition:

echo $UC
 0

my comparison to check the value of UC fails because it has weird control characters in the front.

if [ "$UC" == "0" ]; then echo 1; else echo 0; fi
0

echo $UC | hexdump
0000000 5b1b 313f 3330 6834 3020 000a
000000b

When I try to remove non-digits from the input, I get weird output

echo $UC | sed 's/[^0-9]*//g'
10340

What is happening here and how can I format the result to do my simple comparison?


Solution

  • ESC[?1034h is an xterm control sequence meaning "Interpret meta key, sets eighth bit." (A useful list of xterm control sequences is here.) So presumably impala-shell has noticed that you have an xterm-compatible terminal, and is trying to initialize it for interactive use, even though the -q command-line option renders that pointless.

    You can probably avoid the problem by invoking impala-shell with the TERM environment variable set to ansi, which will prevent terminfo-based programs from emitting the smm control sequence:

    UC=$(TERM=ansi impala-shell -r -q "select count(1) from table where condition=1" \
                                -d $DB -i $HOST -B)
    

    You might also be able to convince impala-shell that interactivity is unnecessary by redirecting stdin:

    UC=$(impala-shell </dev/null -r -q "select count(1) from table where condition=1" \
                                 -d $DB -i $HOST -B)
    

    Either way, a feature request to the authors of impala-shell seems reasonable. I think the problem occurs at line 142 of impala_shell.py (self.readline = __import__('readline')); import readline has the side-effect of initializing the underlying readline library, which then effectively does tput smm; if $TERM indicates that smm exists, it will be sent. There's nothing wrong with initializing the readline library if you're going to use it, but in the case of a non-interactive shell you are not going to use it. So one solution would be to check for interactivity before importing readline (there is already a fallback in case the import fails). Another option might be to delay importing readline (and hence initializing it) until it is actually needed.