Search code examples
bashsedhbasehbase-shell

Save all hbase table names to the bash array


I would like to store the names of all my hbase tables in an array inside my bash script.

  1. All sed hotfixes are acceptable.
  2. All better solutions (like readarray it from some zookeeper file I am not aware of) are acceptable

I have two hbase tables called MY_TABLE_NAME_1 and MY_TABLE_NAME_2, so what I want would be:

tables = (
  MY_TABLE_NAME_1
  MY_TABLE_NAME_2
)

What I tried:

Basing on HBase Shell in OS Scripts by Cloudera:

echo "list" | /path/to/hbase/bin/hbase shell -n > /home/me/hbase-tables
readarray -t tables < /home/me/hbase-tables

but inside my /home/me/hbase-tables is:

MY_TABLE_NAME_1
MY_TABLE_NAME_2
2 row(s) in 0.3310 seconds

MY_TABLE_NAME_1
MY_TABLE_NAME_2

Solution

  • You could use readarray/mapfile just fine. But to remove duplicates/skip empty lines and remove unnecessary strings, you need a filter using awk.

    Also you don't need to create a temporary file and then parse that file, but directly use a technique called process substitution which allows the output of a command be available as if it is available in a temporary file

    mapfile -t output < <(echo "list" | /path/to/hbase/bin/hbase shell -n | awk '!unique[$0]++ && !/seconds/ && NF')
    

    Now the array would contain only the unique table names from the hbase output. That said, you should really look-up for the solution to remove the noise as part of the query output than post-process it this way.