Search code examples
sqlgoogle-bigqueryfeature-engineering

Using SQL to determine whether a user has previously seen a question


I have a table which looks like the following:

Before

I would like to add a new feature 'question_seen', which determines whether a user has previously seen the particular question. The table with the new feature will look like this:

After

How can I create this new feature using SQL commands?


Solution

  • Use a window function with conditional logic:

    select t.*,
           (row_number() over (partition by user_id, question_id order by row_id) > 1) as question_seen
    from t;
    

    Note: This returns a boolean rather than 0/1. I prefer booleans for a flag, but it is easy to change to 0/1.