Search code examples
hadoopapache-pig

How to compare two columns in PIG and remove any same values regardless of upper/lower case


I have 3 columns, an id column and 2 name columns. Sometimes the 2 name columns are the same valued but one is upper case in one column and lower case in another. How do I remove these where the value is the same (or have similar characters) but the casing is different?

Ex:

a = load txt file a = foreach a generate id, name1, name2

current output:

id1, james, JAMES
id2, tom, Tom
id3, Jim, Bob
id4, Bill, billy

expected output: only this 1 result below

a = compare name1 and name2 and if there are any similar characters in name1 that are also in name 2, filter these out

id3, Jim, Bob

Thanks for any help!


Solution

  • Assuming you have loaded the data in Relation A and names are of type chararray.

    A = FILTER A BY (LOWER(A.$1) != LOWER(A.$2))
    DUMP A;