Search code examples
mysqlsqlimpala

How to select unique columns from X columns but show X + Y columns from the results in SQL?


This table is on our impala cluster, although, the impala shell seems very similar to SQL. I have very little SQL or Impala experience, outside of doing extremely simple queries so this problem might be way above my experience and pay grade.

I need to get some data from our db using impala. There are other columns in the table but my command grabs these 3 columns I'm interested in. The data/result can look like this:

time | server | data
  0  |   A    | 500
  0  |   B    | 200
  0  |   C    | 300
  1  |   A    | 100
  1  |   A    | 400
  1  |   B    | 200
  1  |   C    | 300
  2  |   A    | 900
  2  |   B    | 800
  2  |   C    | 700
  2  |   C    | 600

Ideally, the data should look like time 0, where each server only shows up once for each time value. However, there's a bug and as shown, a server can show up more than once for a time value. And it's not the same server that shows up more than once per time value either, as shown in the example above.

In any case, what impala-shell/SQL command do I read up/look up to tell it to select distinct time and server but to choose the larger of the two data values? I don't know how to proceed past:

select distinct time, server from table;

If this is too hard or complicated, perhaps someone can point me to a command that selects distinct time and server but prints the first value of data it finds for the distinct pair of time and server.


Solution

  • To get the unique time and server values along with the largest data value you could use a query with a GROUP BY, as follows:

    SELECT TIME, SERVER, MAX(DATA)
      FROM TABLE
      GROUP BY TIME, SERVER
      ORDER BY TIME, SERVER;
    

    Hope this helps.

    Share and enjoy.