Search code examples
etliccube

icCube ETL - Java View - group by on more than 1 column + retrieve max and min value


In the icCube Builder ETL, I want to group the data on more than one field. Also, as aggregation function, I would like to make use of MAX and MIN.

Example data:

sample data to group by on groupId and phase

(same data in text)

groupId phase startDate endDate 100 start 1-May-2018 5-May-2018 100 start 4-May-2018 7-May-2018 100 start 28-Apr-2018 1-May-2018 100 middle 4-May-2018 11-May-2018 100 middle 1-May-2018 10-May-2018 100 end 12-May-2018 15-May-2018 100 end 11-May-2018 13-May-2018 100 end 13-May-2018 14-May-2018 100 end 9-May-2018 12-May-2018 200 start 4-Apr-2018 2-May-2018 200 middle 18-Apr-2018 3-May-2018 200 middle 1-May-2018 1-May-2018 300 end 21-Apr-2018 24-Apr-2018

I would like to group this data on groupId and phase and get the minimum startDate and the maximum endDate:

enter image description here

How to best do that in the icCube ETL?


Solution

  • We're adding a new version of groupBy View in the ETL layer to support this. However you can create a Java view to perform the groupBy.

    Something like :

    package iccube.pub;
    import java.util.*;
    import java.lang.*;
    import org.joda.time.*;
    import crazydev.iccube.pub.view.*;
    
    public class CustomJavaView implements IOlapBuilderViewLogic
    {
    private Map<List<Comparable>,List<Agg>> cached;
    
    public CustomJavaView()
    {
    }
    
    public void onInitMainTable(Map<String, IOlapCachedTable> cachedTables, IOlapDataTableDef mainTable)
    {
        cached = new HashMap();
    }
    
    public boolean onNewRow(IOlapViewContext context, Map<String, IOlapCachedTable> cachedTables, IOlapDataTableDef mainTable, IOlapReadOnlyDataRow mainTableRow)
    {
        // create the groupby key (list of values) 
        final List<Comparable> groupBy = Arrays.asList(mainTableRow.get("phase"), mainTableRow.get("groupId"));
    
        // get the aggregators for values for the keys, build them if not already there
        final List<Agg> aggs = cached.computeIfAbsent(groupBy, key -> Arrays.asList(new Agg(true), new Agg(false)));
        // add values
        aggs.get(0).add(mainTableRow.getAsDateTime("startDate"));
        aggs.get(1).add(mainTableRow.getAsDateTime("endDate"));
    
        return true; // false to stop
    }
    
    public void onProcessingCompleted(IOlapViewContext context, Map<String, IOlapCachedTable> cachedTables)
    {
        // now we can fire rows
        for (Map.Entry<List<Comparable>, List<Agg>> entry : cached.entrySet())
        {
            final List<Comparable> groupByKey = entry.getKey();
            final List<Agg> aggs = entry.getValue();
    
            // create empty row
            final IOlapDataTableRow row = context.newRow();
            row.set("phase",groupByKey.get(0));
            row.set("groupId",groupByKey.get(1));
            row.set("startDate",aggs.get(0).date);
            row.set("endDate",aggs.get(1).date);
    
            context.fireRow(row);
        }
    }
    
    // this is the Aggregator, you could implement something more complicated
    static class Agg
    {
        final int isMin;
    
        LocalDateTime date;
    
        Agg(boolean isMin)
        {
            this.isMin = isMin ? -1 : 1;
        }
    
        void add(LocalDateTime ndate)
        {
            if (ndate != null)
            {
                date = ( date!= null && ((date.compareTo(ndate) * isMin) > 0)) ? date : ndate;
            }
        }
    
    }
    }