Search code examples
c++algorithmboostintervalsboost-icl

Combinations of N Boost interval_set


I have a service which has outages in 4 different locations. I am modeling each location outages into a Boost ICL interval_set. I want to know when at least N locations have an active outage.

Therefore, following this answer, I have implemented a combination algorithm, so I can create combinations between elemenets via interval_set intersections.

Whehn this process is over, I should have a certain number of interval_set, each one of them defining the outages for N locations simultaneusly, and the final step will be joining them to get the desired full picture.

The problem is that I'm currently debugging the code, and when the time of printing each intersection arrives, the output text gets crazy (even when I'm using gdb to debug step by step), and I can't see them, resulting in a lot of CPU usage.

I guess that somehow I'm sending to output a larger portion of memory than I should, but I can't see where the problem is.

This is a SSCCE:

#include <boost/icl/interval_set.hpp>
#include <algorithm>
#include <iostream>
#include <vector>


int main() {
    // Initializing data for test
    std::vector<boost::icl::interval_set<unsigned int> > outagesPerLocation;
    for(unsigned int j=0; j<4; j++){
        boost::icl::interval_set<unsigned int> outages;
        for(unsigned int i=0; i<5; i++){
            outages += boost::icl::discrete_interval<unsigned int>::closed(
                (i*10), ((i*10) + 5 - j));
        }
        std::cout << "[Location " << (j+1) << "] " << outages << std::endl;
        outagesPerLocation.push_back(outages);
    }

    // So now we have a vector of interval_sets, one per location. We will combine
    // them so we get an interval_set defined for those periods where at least
    // 2 locations have an outage (N)
    unsigned int simultaneusOutagesRequired = 2;  // (N)

    // Create a bool vector in order to filter permutations, and only get
    // the sorted permutations (which equals the combinations)
    std::vector<bool> auxVector(outagesPerLocation.size());
    std::fill(auxVector.begin() + simultaneusOutagesRequired, auxVector.end(), true);

    // Create a vector where combinations will be stored
    std::vector<boost::icl::interval_set<unsigned int> > combinations;

    // Get all the combinations of N elements
    unsigned int numCombinations = 0;
    do{
        bool firstElementSet = false;
        for(unsigned int i=0; i<auxVector.size(); i++){
            if(!auxVector[i]){
                if(!firstElementSet){
                    // First location, insert to combinations vector
                    combinations.push_back(outagesPerLocation[i]);
                    firstElementSet = true;
                }
                else{
                    // Intersect with the other locations
                    combinations[numCombinations] -= outagesPerLocation[i];
                }
            }
        }
        numCombinations++;
        std::cout << "[-INTERSEC-] " << combinations[numCombinations] << std::endl;  // The problem appears here
    }
    while(std::next_permutation(auxVector.begin(), auxVector.end()));

    // Get the union of the intersections and see the results
    boost::icl::interval_set<unsigned int> finalOutages;
    for(std::vector<boost::icl::interval_set<unsigned int> >::iterator
        it = combinations.begin(); it != combinations.end(); it++){
        finalOutages += *it;
    }

    std::cout << finalOutages << std::endl;
    return 0;
}

Any help?


Solution

  • As I surmised, there's a "highlevel" approach here.

    Boost ICL containers are more than just containers of "glorified pairs of interval starting/end points". They are designed to implement just that business of combining, searching, in a generically optimized fashion.

    So you don't have to.

    If you let the library do what it's supposed to do:

    using TimePoint = unsigned;
    using DownTimes = boost::icl::interval_set<TimePoint>;
    using Interval  = DownTimes::interval_type;
    using Records   = std::vector<DownTimes>;
    

    Using functional domain typedefs invites a higher level approach. Now, let's ask the hypothetical "business question":

    What do we actually want to do with our records of per-location downtimes?

    Well, we essentially want to

    1. tally them for all discernable time slots and
    2. filter those where tallies are at least 2
    3. finally, we'd like to show the "merged" time slots that remain.

    Ok, engineer: implement it!


    1. Hmm. Tallying. How hard could it be?

      ❕ The key to elegant solutions is the choice of the right datastructure

      using Tally     = unsigned; // or: bit mask representing affected locations?
      using DownMap   = boost::icl::interval_map<TimePoint, Tally>;
      

      Now it's just bulk insertion:

      // We will do a tally of affected locations per time slot
      DownMap tallied;
      for (auto& location : records)
          for (auto& incident : location)
              tallied.add({incident, 1u});
      
    2. Ok, let's filter. We just need the predicate that works on our DownMap, right

      // define threshold where at least 2 locations have an outage
      auto exceeds_threshold = [](DownMap::value_type const& slot) {
          return slot.second >= 2;
      };
      
    3. Merge the time slots!

      Actually. We just create another DownTimes set, right. Just, not per location this time.

      The choice of data structure wins the day again:

      // just printing the union of any criticals:
      DownTimes merged;
      for (auto&& slot : tallied | filtered(exceeds_threshold) | map_keys)
          merged.insert(slot);
      

    Report!

    std::cout << "Criticals: " << merged << "\n";
    

    Note that nowhere did we come close to manipulating array indices, overlapping or non-overlapping intervals, closed or open boundaries. Or, [eeeeek!] brute force permutations of collection elements.

    We just stated our goals, and let the library do the work.

    Full Demo

    Live On Coliru

    #include <boost/icl/interval_set.hpp>
    #include <boost/icl/interval_map.hpp>
    #include <boost/range.hpp>
    #include <boost/range/algorithm.hpp>
    #include <boost/range/adaptors.hpp>
    #include <boost/range/numeric.hpp>
    #include <boost/range/irange.hpp>
    #include <algorithm>
    #include <iostream>
    #include <vector>
    
    using TimePoint = unsigned;
    using DownTimes = boost::icl::interval_set<TimePoint>;
    using Interval  = DownTimes::interval_type;
    using Records   = std::vector<DownTimes>;
    
    using Tally     = unsigned; // or: bit mask representing affected locations?
    using DownMap   = boost::icl::interval_map<TimePoint, Tally>;
    
    // Just for fun, removed the explicit loops from the generation too. Obviously,
    // this is bit gratuitous :)
    static DownTimes generate_downtime(int j) {
        return boost::accumulate(
                boost::irange(0, 5),
                DownTimes{},
                [j](DownTimes accum, int i) { return accum + Interval::closed((i*10), ((i*10) + 5 - j)); }
            );
    }
    
    int main() {
        // Initializing data for test
        using namespace boost::adaptors;
        auto const records = boost::copy_range<Records>(boost::irange(0,4) | transformed(generate_downtime));
    
        for (auto location : records | indexed()) {
            std::cout << "Location " << (location.index()+1) << " " << location.value() << std::endl;
        }
    
        // We will do a tally of affected locations per time slot
        DownMap tallied;
        for (auto& location : records)
            for (auto& incident : location)
                tallied.add({incident, 1u});
    
        // We will combine them so we get an interval_set defined for those periods
        // where at least 2 locations have an outage
        auto exceeds_threshold = [](DownMap::value_type const& slot) {
            return slot.second >= 2;
        };
    
        // just printing the union of any criticals:
        DownTimes merged;
        for (auto&& slot : tallied | filtered(exceeds_threshold) | map_keys)
            merged.insert(slot);
    
        std::cout << "Criticals: " << merged << "\n";
    }
    

    Which prints

    Location 1 {[0,5][10,15][20,25][30,35][40,45]}
    Location 2 {[0,4][10,14][20,24][30,34][40,44]}
    Location 3 {[0,3][10,13][20,23][30,33][40,43]}
    Location 4 {[0,2][10,12][20,22][30,32][40,42]}
    Criticals: {[0,4][10,14][20,24][30,34][40,44]}