Search code examples
javascriptcrossfilter

Crossfilter - Cannot get filtered records from other groups (NOT from associate groups)


I'm working with "airplane" data set from this reference http://square.github.io/crossfilter/

date,delay,distance,origin,destination
01010001,14,405,MCI,MDW
01010530,-11,370,LAX,PHX
...

  // Create the crossfilter for the relevant dimensions and groups.
  var flight = crossfilter(flights),
      all = flight.groupAll(),
      date = flight.dimension(function(d) { return d.date; }),
      dates = date.group(d3.time.day),
      hour = flight.dimension(function(d) { return d.date.getHours() + d.date.getMinutes() / 60; }),
      hours = hour.group(Math.floor),
      delay = flight.dimension(function(d) { return Math.max(-60, Math.min(149, d.delay)); }),
      delays = delay.group(function(d) { return Math.floor(d / 10) * 10; }),
      distance = flight.dimension(function(d) { return Math.min(1999, d.distance); }),
      distances = distance.group(function(d) { return Math.floor(d / 50) * 50; });

Following document of Crossfilter, "groups don't observe the filters on their own dimension" => we can get filtered records from groups that theirs dimension are not filtered at this moment, can't we?

I have performed some test but this is not correct:

  console.dir(date.group().all()); // 50895 records
  console.dir(distance.group().all()); // 297 records

  date.filter([new Date(2001, 1, 1), new Date(2001, 2, 1)]);

  console.dir(date.group().all()); // 50895 records => this number still the same because we are filtering on its dimension
  console.dir(distance.group().all()); // 297 records => but this number still the same too. I don't know why
  1. Could you please explain for me why number of "distance.group().all()" still the same as before we perform the filter? Am I missing something here?

  2. If we really cannot get "filtered records" from "distance dimension" by this way, how can I achive this?

Thanks.


Solution

  • So, yes, this is the expected behavior.

    Crossfilter will create a "bin" in the group for every value it finds by applying the dimension key and group key functions. Then when a filter is applied, it will apply the reduce-remove function, which by default subtracts the count of rows removed.

    The result is that empty bins still exist, but they have a value of 0.

    EDIT: here is the Crossfilter Gotchas entry with further explanation.

    If you want to remove the zeros, you can use a "fake group" to do that.

    function remove_empty_bins(source_group) {
        return {
            all:function () {
                return source_group.all().filter(function(d) {
                    //return Math.abs(d.value) > 0.00001; // if using floating-point numbers
                    return d.value !== 0; // if integers only
                });
            }
        };
    }
    

    https://github.com/dc-js/dc.js/wiki/FAQ#remove-empty-bins

    This function wraps the group in an object which implements .all() by calling source_group.all() and then filters the result. So if you're using dc.js you could supply this fake group to your chart like so:

    chart.group(remove_empty_bins(yourGroup));