Search code examples
sqlperformancerandomgoogle-bigqueryshuffle

Shuffle a specific column in a table on BigQuery


I have a table that looks like this:

    id label
     1   A
     2   A
     3   A
     4   B
     5   C
     6   C
     7   A
     8   A
     9   C
     10  B

I want to get another column label_shuffled that is the existing column label but shuffled. I need it to be efficient and fast.

Desired output:

    id label  label_shuffled
     1   A         A
     2   A         B
     3   A         C
     4   B         A
     5   C         C
     6   C         A
     7   A         C
     8   A         A
     9   C         B
     10  B         A

Any suggestions?


Solution

  • An option is use window function ROW_NUMBER to enumerate the rows randomly and then join:

    WITH suffle AS (
      SELECT
        id,
        label,
        ROW_NUMBER() OVER () row_number,
        ROW_NUMBER() OVER (ORDER BY RAND()) row_number_suffled
      FROM labels
    )
    SELECT 
      l.id,
      l.label,
      s.label as label_suffled
    FROM suffle l 
    JOIN suffle s on l.row_number = s.row_number_suffled