Search code examples
hadoophdfsapache-pigpig-udf

How to create UDF in pig for categorize columns with respect to another filed


I want to categorize one column with respect to other column using UDF in pig.

Data i have

Id,name,age
1,jhon,31
2,adi,15
3,sam,25
4,lina,28

Expected output

1,jhon,31,30-35
2,adi,15,10-15
3,sam,25,20-25
4,lina,28,25-30

Please suggest


Solution

  • you can create pig udfs in eclipse

    create a project in eclipse with pig jars and try below code

    package com;
    
    import java.io.IOException;
    
    import org.apache.pig.EvalFunc;
    import org.apache.pig.backend.executionengine.ExecException;
    import org.apache.pig.data.Tuple;
    
    
    
    public class Age extends EvalFunc<String>{
    
        @Override
        public String exec(Tuple a) throws IOException {
            // TODO Auto-generated method stub
            if(a == null || a.size() == 0){
                return null;
            }
            try{
                Object object = a.get(0);
                if(object == null){
                    return null;
                }
                int i = (Integer) object;
                if(i >= 10 && i <= 20 ){
                    return "10-20";
                }
                else if (i >= 21 && i <= 30){
                    return "20-30";
                }
                else 
                    return ">30";
            } catch (ExecException e){
                throw new IOException(e);
            }
        }
    
    }
    

    Now export the project as jar and register it in pig shell

    REGISTER <path of your .jar file>
    

    Define it with package and class.

    DEFINE U com.Age();
    
    a = LOAD '<input path>' using PigStorage(',') as (id:int,name:chararray,age:int);
    
    b = FOREACH a GENERATE id,name,age,U(age);