BigQuery Cross Join Much Faster with Each

I'm trying to look at user activity by date. The first step is to build a table of every day since a user account was created, using cross join and a where clause. My first attempt was this:

SELECT
  u.user_id as user_id,
  date(u.created) as signup_date,
  cal.date as date,

from rsdw.user u
  cross join (select date(dt) as date from [rsdw.calendar] where date(dt) < CURRENT_DATE() ) cal
where
  date(u.created) <= cal.date

(The calendar table is just a list of all dates since 2006 (3288 rows). The user table has ~1m rows.)

This query takes a forever... so long that I've abandoned it at 1000 seconds or so. I tried tweaking the query a little. If I add an "each" to the cross join:

SELECT
  u.user_id as user_id,
  date(u.created) as signup_date,
  cal.date as date,

from rsdw.user u
  cross join each (select date(dt) as date from [rsdw.calendar] where date(dt) < CURRENT_DATE() ) cal
where
  date(u.created) <= cal.date

I get an errror:

Error: Cannot CROSS JOIN two tables with EACH qualifiers.

Finally, if I keep the "each" but swap the tables, it completes in just 90s!

SELECT
  u.user_id as user_id,
  date(u.created) as signup_date,
  cal.date as date,

from (select date(dt) as date from [rsdw.calendar] where date(dt) < CURRENT_DATE() ) cal
  cross join each rsdw.user u
where
  date(u.created) <= cal.date

Can anyone explain why the third iteration is so much faster and why the second one results in an error?

Solution

You're hitting some corner-case behavior with how joins interact with sub-select clauses. Joins against named tables benefit from some optimization based on table size, while sub-selects are unpredictable and can result in bad performance. I've filed an internal bug for us to improve this case.

In the first, slow case, your tiny sub-select of dates is copied and broadcast to a small number of machines each handling a large number of users. It takes forever as there is very little parallelism.

The second case is a query parsing error for internal reasons, basically it attempts to have machines handle both a small range of dates and a small range of users, which will not complete a cross join.

In the third, fast case, your tiny sub-select of dates is copied and broadcast to many machines, each handling a small range of users. It completes very fast due to the large parallelism.

Once we complete the bug I've filed, the third case behavior will hopefully happen automatically.