Search code examples
hadoopapache-pigbigdata

Filtering in pig by concatenating two column


I have two table in the following format

Table 1: com_Data

#cc bb mm#

41 22 2563

42 24 3562

Table 2:

#name cid#

sasi 41-22-2563

soman 42-47-2562

I want to compaine the three column cc bb mm from table 1 and need to filter out all the column from the table 2 which match the combined values

How to filter it in pig

When try to concatenate the three column separated by '-' using pig resulted in error below is the code which I used

a = LOAD 'default.com_data' USING org.apache.hcatalog.pig.HCatLoader();
b = foreach a generate concat(cc,'-',bb,'-',mm); 

How to filter the table ?


Solution

  • Its look like the datatype of cc,bb and mm are numbers. Can you change the datatype to charrarray or bytearray, so that concat will work. Please see the below example

    input.txt
    41 22 2563
    42 24 3562
    43 46 1234
    
    input1.txt
    sasi 41-22-2563
    soman 42-47-2562
    test 43-46-1234
    
    PigScript:
    A = LOAD 'input.txt' USING PigStorage(' ') AS (cc:chararray,bb:chararray,mm:chararray);
    AA = LOAD 'input1.txt' USING PigStorage(' ') AS (name:chararray,cid:chararray);
    B = FOREACH A GENERATE CONCAT(cc,'-',bb,'-',mm) as newCid;
    C = JOIN AA BY cid,B BY newCid;
    D = FOREACH C GENERATE $0,$1;
    DUMP D;
    
    Output:
    (sasi,41-22-2563)
    (test,43-46-1234)