Search code examples
postgresqlgenerate-series

Postgresql , opposite of a group by


Here's my use case:

We have a analytics-like tools which used to count the number of users per hour on our system. And now the business would like to have the number of unique users. As our amount of user is very small, we will do that using

SELECT count(*)
FROM (
   SELECT DISTINCT user_id
   FROM unique users
   WHERE date BETWEEN x and y
) distinct_users

i.e we will store the couple user_id, date and count unique users using DISTINCT (user_id is not a foreign key, as users are not logged in, it's just a unique identifier generated by the system, some kind of uuidv4 ) this works great in term of performance for a magnitude of data.

Now the problem is to import legacy data in it

I would like to know the SQL query to transform

   date    |  number_of_users
   12:00   |     2 
   13:00   |     4

into

      date |  user_id
   12:00   |     1 
   12:00   |     2 
   13:00   |     1
   13:00   |     2
   13:00   |     3
   13:00   |     4

(as long as the "count but not unique" returns the same number as before, we're fine if the "unique users count" is a bit off)

Of course, I could do a python script, but I was wondering if there was a SQL trick to do that, using generate_series or something related


Solution

  • generate_series() is indeed the way to go:

    with data (date, number_of_users) as (
      values 
        ('12:00',2),  
        ('13:00',4)
    )
    select d.date, i.n
    from data d
      cross join lateral generate_series(1, d.number_of_users) i (n)
    order by d.date, i.n ;