Search code examples
sqlgoogle-bigquerydate-arithmetic

Get multiple missing dates from bigquery sql query


I have a BigQuery table that keeps track of user sign ins:

user_id user_name user_sign_in_date
1 john doe 2019-04-05
1 john doe 2019-04-06
2 bob bobson 2019-04-05
2 bob bobson 2019-04-08
3 jane beer 2019-04-05
3 jane beer 2019-04-06
3 jane beer 2019-04-07
3 jane beer 2019-04-08
4 amy face 2019-04-09

I want to be able to get users who have not signed in on certain dates and list the dates they have missed. For example, if I want to see users who have not signed in between 2019-04-05 and 2019-04-08, I would need a SQL query that could help me generate something like:

[
  {
    'user_id': 1,
    'user_name': 'john doe',
    'dates_not_signed_in': ['2019-04-07', '2019-04-08']
  },
  {
    'user_id': 2,
    'user_name': 'bob bobson',
    'dates_not_signed_in': ['2019-04-06', '2019-04-07']
  },
  {
    'user_id': 4,
    'user_name': 'amy face',
    'dates_not_signed_in': ['2019-04-05', '2019-04-06', '2019-04-07', '2019-04-08']
  }
]

I think I need to do something like

SELECT user_id
FROM table

to get all the user IDs but then I'm not sure how to surround that query so that I only get back users who didn't sign in on certain dates with the dates they missed.


Solution

  • Consider also below approach

    select * from (
      select user_id, user_name,
        array_to_string(array(
          select cast(day as string) from unnest(generate_date_array('2019-04-05', '2019-04-08')) day
          except distinct select cast(day as string) from unnest(arr) day
        ), ', ') as dates_not_signed_in
      from (
        select user_id, user_name, array_agg(user_sign_in_date) arr
        from your_table
        group by user_id, user_name
      )
    )
    where dates_not_signed_in != ''            
    

    if applied to sample data in your question - output is

    enter image description here