Search code examples
d3.jschartsquantilequantization

Quantize Integers into discrete buckets


I have a list of ~7500 items which all have a similar signature:

{
    revenue: integer,
    title: string,
    sector: string
}

The revenue will range from 0 to ~1 Billion. I'd like to construct a scale such that, given a particular company's revenue..it returns its position relative to the following 'buckets':

$0-5 Million
$5-10 Million
$10-25 Million
$25-50 Million
$50-100 Million
$100-250 Million
> $250 Million

I believe I should be able to accomplish this with either a quantize or quantile scale in d3, but have had difficulties getting the expected results. So far, I have something like:

var max_rev = 1000000000 // 1 Billion
scale = d3.scale.quantize().domain(_.range(max_rev)).range([5000000, 10000000, 25000000, 50000000, 100000000, 250000000])

One obvious issue is calling _.range(max_rev) creates an array 1 billion items long, so I'm wondering how I can do that more effectively (something like .domain([0, 1000000000])?)

What would be the best way to define this scale so that, scale(75000000) would return 50000000. Once I have that, I could check it against a hash and return the correct label:

{
    ...
    ...
    50000000: "$50-100 Million",
    100000000: "$100-250 Million",
    ...

}

Thanks so much! Please let me know if there is any other information I can provide.


Solution

  • A quantize scale won't work in this case, as your domain and range are not split uniformly. Instead, you can use a threshold scale.

    Here's an example:

    var dollars = d3.format("$,d"),
      data = d3.range(100).map(function(d, i) {
          return {
              revenue: parseInt(Math.random() * 1000000000),
              title: "Company " + i,
              sector: "Sector " + parseInt(Math.random() * 10)
          }
      }),
      quantize = d3.scale.threshold()
                         .domain([5000000, 10000000, 25000000, 50000000, 100000000, 250000000])
                         .range([0, 5000000, 10000000, 25000000, 50000000, 100000000, 250000000]);
    
    var table = d3.select("#info").append("table");
    
    table.append("thead").append("tr").selectAll("th")
        .data(['company', 'sector', 'revenue', 'quantized_revenue'])
      .enter()
        .append("td")
        .text(function(d) {
          return d;
        });
    
    var rows = table.append("tbody").selectAll("tr")
        .data(data)
      .enter()
        .append("tr")
        .attr("class", "company")
    
    rows.append("td").text(function(d) {
        return d.title;
    });
    rows.append("td").text(function(d) {
        return d.sector;
    });
    rows.append("td").text(function(d) {
        return dollars(d.revenue);
    });
    rows.append("td").text(function(d) {
        return dollars(quantize(d.revenue));
    });
    table {
        width: 100%;
    }
    thead {
        background-color: #ccc;
    }
    <script src="https://cdnjs.cloudflare.com/ajax/libs/d3/3.4.11/d3.min.js"></script>
    <div id="info"></div>

    The interesting bits are:

      quantize = d3.scale.threshold()
                         .domain([5000000, 10000000, 25000000, 50000000, 100000000, 250000000])
                         .range([0, 5000000, 10000000, 25000000, 50000000, 100000000, 250000000]);
    

    The domain sets the threshold values that the input values are compared to and the range defines the output. It just happens that in this case, the range is basically the same as the domain, but it doesn't have to be. The range could be a list of color values, pixels to define the height of bars, etc.

    You could even do this, and avoid looking up the result in a hash table:

    var dollars = d3.format("$,d"),
      data = d3.range(100).map(function(d, i) {
          return {
              revenue: parseInt(Math.random() * 1000000000),
              title: "Company " + i,
              sector: "Sector " + parseInt(Math.random() * 10)
          }
      }),
      quantize = d3.scale.threshold()
                         .domain([5000000, 10000000, 25000000, 50000000, 100000000, 250000000])
                         .range(["$0-5", "$5-10", "$10-25", "$25-50", "$50-100", "$100-250",  "> $250"].map(function(d) { return d + " Million"; }));
    
    var table = d3.select("#info").append("table");
    
    table.append("thead").append("tr").selectAll("th")
        .data(['company', 'sector', 'revenue', 'quantized_revenue'])
      .enter()
        .append("td")
        .text(function(d) {
          return d;
        });
    
    var rows = table.append("tbody").selectAll("tr")
        .data(data)
      .enter()
        .append("tr")
        .attr("class", "company")
    
    rows.append("td").text(function(d) {
        return d.title;
    });
    rows.append("td").text(function(d) {
        return d.sector;
    });
    rows.append("td").text(function(d) {
        return dollars(d.revenue);
    });
    rows.append("td").text(function(d) {
        return quantize(d.revenue);
    });
    table {
        width: 100%;
    }
    thead {
        background-color: #ccc;
    }
    <script src="https://cdnjs.cloudflare.com/ajax/libs/d3/3.4.11/d3.min.js"></script>
    <div id="info"></div>