Search code examples
sqldistinctwindow-functionsprestoamazon-athena

select distinct name and username on each window partition on presto


I am using window function on presto to get the distinct name and username row of each group. I did apply the ROW_NUMBER() on my name column for Partition with datetime column as order by and I get below result

Current Output:

name    top_user     Count           Date         Price  Percent  Volume      username   
ENZC    1            5   2021-03-07 08:11:14.000 0.1189  45.05   86131409    DeviantImmortal
ENZC    5            5   2021-03-07 08:11:14.000 0.1189  45.05   86131409    OtcRock
ENZC    2            5   2021-03-07 08:11:14.000 0.1189  45.05   86131409    OtcRock
ENZC    3            5   2021-03-07 08:11:14.000 0.1189  45.05   86131409    STOCKAHOLIC55
ENZC    4            5   2021-03-07 08:11:14.000 0.1189  45.05   86131409    stockpro20
HCMC    3            5   2021-03-07 08:34:33.000 0.0002  15.2    1376689232  Barta57
HCMC    5            5   2021-03-07 08:34:33.000 0.0002  15.2    1376689232  PennyProfitPro
HCMC    2            5   2021-03-07 08:34:33.000 0.0002  15.2    1376689232  Stocktipstoday1
HCMC    1            5   2021-03-07 08:34:33.000 0.0002  15.2    1376689232  TTrader1976
HCMC    4            5   2021-03-07 08:34:33.000 0.0002  15.2    1376689232  stockpro20
HQGE    5            6   2021-03-07 07:40:38.000 -0.0017 -16.04  63596752    BerkshireCapGrp
HQGE    1            6   2021-03-07 07:40:38.000 -0.0017 -16.04  63596752    OwnThePlayOTC
HQGE    2            6   2021-03-07 07:40:38.000 -0.0017 -16.04  63596752    PennyStockGeeks
HQGE    3            6   2021-03-07 07:40:38.000 -0.0017 -16.04  63596752    TaylorB16445829
HQGE    4            6   2021-03-07 07:40:38.000 -0.0017 -16.04  63596752    iammpremm
LTNC    2            8   2021-03-07 08:33:19.000 0.0028  10.73   293126083   BigTawno
LTNC    5            8   2021-03-07 08:33:19.000 0.0028  10.73   293126083   Faith03777244
LTNC    3            8   2021-03-07 08:33:19.000 0.0028  10.73   293126083   OneTickMoline
LTNC    1            8   2021-03-07 08:33:19.000 0.0028  10.73   293126083   OneTickMoline
LTNC    4            8   2021-03-07 08:33:19.000 0.0028  10.73   293126083   Stock_Pop
OZSC    2           10  2021-03-07 08:34:38.000 0.0685  72.87   330616866   JZavitka
OZSC    3           10  2021-03-07 08:34:38.000 0.0685  72.87   330616866   JZavitka
OZSC    1           10  2021-03-07 08:34:38.000 0.0685  72.87   330616866   S_AnglinIV
OZSC    4           10  2021-03-07 08:34:38.000 0.0685  72.87   330616866   S_AnglinIV
OZSC    5           10  2021-03-07 08:34:38.000 0.0685  72.87   330616866   claydeath1
SANP    2           5   2021-03-07 08:11:38.000 0.0049  101.04  907907634   1deadmanx
SANP    3           5   2021-03-07 08:11:38.000 0.0049  101.04  907907634   BillTsamis
SANP    5           5   2021-03-07 08:11:38.000 0.0049  101.04  907907634   Fluffypillows9
SANP    1           5   2021-03-07 08:11:38.000 0.0049  101.04  907907634   Fluffypillows9

Current Query:

SELECT * FROM 
(
SELECT  name, username , datetime, message, 
ROW_NUMBER() OVER (PARTITION BY name ORDER BY datetime ASC) AS top_user FROM table_name 
)
WHERE top_user < 6 ORDER BY name
   

Expected Output: get distinct name and username for each window partition i.e no two row of any window has same name and username

name    top_user     Count           Date         Price  Percent  Volume      username   
ENZC    1            5   2021-03-07 08:11:14.000 0.1189  45.05   86131409    DeviantImmortal
ENZC    2            5   2021-03-07 08:11:14.000 0.1189  45.05   86131409    OtcRock
ENZC    3            5   2021-03-07 08:11:14.000 0.1189  45.05   86131409    STOCKAHOLIC55
ENZC    4            5   2021-03-07 08:11:14.000 0.1189  45.05   86131409    stockpro20
HCMC    3            5   2021-03-07 08:34:33.000 0.0002  15.2    1376689232  Barta57
HCMC    5            5   2021-03-07 08:34:33.000 0.0002  15.2    1376689232  PennyProfitPro
HCMC    2            5   2021-03-07 08:34:33.000 0.0002  15.2    1376689232  Stocktipstoday1
HCMC    1            5   2021-03-07 08:34:33.000 0.0002  15.2    1376689232  TTrader1976
HCMC    4            5   2021-03-07 08:34:33.000 0.0002  15.2    1376689232  stockpro20
HQGE    5            6   2021-03-07 07:40:38.000 -0.0017 -16.04  63596752    BerkshireCapGrp
HQGE    1            6   2021-03-07 07:40:38.000 -0.0017 -16.04  63596752    OwnThePlayOTC
HQGE    2            6   2021-03-07 07:40:38.000 -0.0017 -16.04  63596752    PennyStockGeeks
HQGE    3            6   2021-03-07 07:40:38.000 -0.0017 -16.04  63596752    TaylorB16445829
HQGE    4            6   2021-03-07 07:40:38.000 -0.0017 -16.04  63596752    iammpremm
LTNC    2            8   2021-03-07 08:33:19.000 0.0028  10.73   293126083   BigTawno
LTNC    5            8   2021-03-07 08:33:19.000 0.0028  10.73   293126083   Faith03777244
LTNC    3            8   2021-03-07 08:33:19.000 0.0028  10.73   293126083   OneTickMoline
LTNC    4            8   2021-03-07 08:33:19.000 0.0028  10.73   293126083   Stock_Pop
OZSC    2           10  2021-03-07 08:34:38.000 0.0685  72.87   330616866   JZavitka
OZSC    1           10  2021-03-07 08:34:38.000 0.0685  72.87   330616866   S_AnglinIV
OZSC    5           10  2021-03-07 08:34:38.000 0.0685  72.87   330616866   claydeath1
SANP    2           5   2021-03-07 08:11:38.000 0.0049  101.04  907907634   1deadmanx
SANP    3           5   2021-03-07 08:11:38.000 0.0049  101.04  907907634   BillTsamis
SANP    1           5   2021-03-07 08:11:38.000 0.0049  101.04  907907634   Fluffypillows9

Solution

  • One way to do it is to filter out the duplicate usernames by partitioning first by name and username and once again by name only:

    SELECT *
    FROM (
      SELECT *, ROW_NUMBER() OVER (PARTITION BY name ORDER BY datetime) rn2
      FROM (
        SELECT *, ROW_NUMBER() OVER (PARTITION BY name, username ORDER BY datetime) rn1
        FROM tablename
      ) t1
      WHERE t1.rn1 = 1 
    ) t2
    WHERE t2.rn2 < 6