It seems I'm somehow removing a key after join.
Context
The goal is to get all records from table_2 that consist only of distinct member ids from table_1.
As a novice, I'd appreciate pointers in best-practices to achieve this goal and any comments on why I get the "projected field" error.
What I've tried
-- assume %default vals set for path_1 and path_2 to data
-- load the data
table_1 = LOAD '$path_1' as (day, member_id, country);
table_2 = LOAD '$path_2' as (day, member_id, country);
-- get distinct member_id's from table_1
table_1_ids = DISTINCT(FOREACH table_1 GENERATE member_id as member_id);
-- get all records from table_2 that only have table_1_ids
new_table_2 = JOIN table_1_ids BY member_id, table_2 BY member_id;
the error
Invalid field projection. Projected field [member_id] does not exist in schema: table_1_ids::member_id:bytearray,table_2::day:bytearray, table_2::member_id:bytearray, table_2::country:bytearray.
Thanks to @piyush and @Nazar for the comments. One way to get the desired result is to i) uniquely label the keys to join on and ii) separate the distinct statement:
table_1 = LOAD '$path_1' as (day, member_id_1, country);
table_2 = LOAD '$path_2' as (day, member_id_2, country);
all_table_1_ids = FOREACH table_1 GENERATE member_id_1 as member_id_1;
distinct_table_1_ids = DISTINCT all_table_1_ids;
new_table_2 = JOIN distinct_table_1_ids BY member_id_1, table_2 BY member_id_2;