Search code examples
javascriptdc.jscrossfilter

DC.js - blank datapoints and histograms


I have a dashboard containing DC row charts and histograms (DC bar charts). My dataset has 100 rows and 10 fields. Six fields have all 100 datapoints, while the other four fields have 95 datapoints and 5 blanks. I need to show all fields on one dashboard in a way that is transparent to the user.

For the row charts, I simply chart the blanks and use the .label() attribute to set the row name to 'Unallocated' where there is a blank.

For the histograms, I am stuck. I would like to show a column that is labelled 'Unallocated', plus the normal linear columns of a histogram. It would also be ok to hide the blank rows from the histogram.

I cannot filter the dimension behind the histogram, because that would remove the rows from the crossfiltered variable, facts, meaning I couldn't see those rows for the six fields that have them.

Standard fake groups won't work, because for a histogram, by the time we get to groups we already have numeric buckets, so there isn't a group that has a key of blank.

Left to itself, DC interprets the blanks as zeroes and buckets them accordingly. I tried limiting the dimension and this works in that the zero-bucket reduces by 5:

var dim = facts.dimension(function(d){ if (d[field]!='') { return Math.floor(+d[field]/binwidth)*binwidth; } });

But doing this,the count of 5 rows is just distributed randomly (it seems) among the other buckets. When I click 'Unallocated' on the row charts, the histograms show 5 in random buckets, sometimes two buckets, while I think they should show nothing at all.

I can count up the blank rows (it is not always 5) and wondered if I could subtract this from the zero-bin via a fake group. But I can't work out how to add a key-value pair or adjust the existing value for key=0;

    var grp = dim.group().reduceCount();
    var grp2 = {all: function() { return grp.top(Infinity); }};

I can't see enough of the structure via print_filter or console.log to work out how to change it. Sorry for length of question. Any help appreciated!

Emma


Solution

  • Putting them into a separate column is going to be tricky because the linear x scale doesn't accommodate extra/NA/non-numerical values very well.

    I think you'd have to switch to an ordinal scale in order to have numerical and non-numerical columns together. Kind of messy, but possible.

    Let's focus on the simpler idea, just skipping those values. I think you're on the right track here.

    The problem is that your dimension key function does not always return a value:

    var dim = facts.dimension(function(d){
        if (d[field]!='') {
            return Math.floor(+d[field]/binwidth)*binwidth;
        }
    });
    

    The else-branch just drops through, and the implicit return value for a JavaScript function is undefined. So you end up with some numerical keys, and some rows with an undefined key. I'm not sure what crossfilter will do with those rows, but since the documentation specifies that the key function must returned naturally-ordered values, my guess is that it will get confused.

    Instead, maybe have this function return a value that's far outside the domain of the histogram?

    var dim = facts.dimension(function(d){
        if (d[field]!='') {
            return Math.floor(+d[field]/binwidth)*binwidth;
        }
        else return -1000;
    });
    

    Then it's easy to create a fake group that screens out a particular value:

    function remove_bins(source_group) { // (source_group, bins...}
        var bins = Array.prototype.slice.call(arguments, 1);
        return {
            all:function () {
                return source_group.all().filter(function(d) {
                    return bins.indexOf(d.key) === -1;
                });
            }
        };
    }
    var grp = remove_bins(dim.group(), -1000);