Search code examples
arrayssortingmergefuzzy-comparison

Sorting/combining related arrays


There must be some algorithm that will make this easier than what I'm doing...

What I have are two arrays, each with two columns. One column in both is a timestamp, and the other in both is a measurement.

What needs to happen is turn this into a single array of: timestamp, measurement1, measurement2

The problem is the timestamps often won't match up exactly. One array might be missing a value completely for a time period, or the timestamps might be off by an insignificant amount (insignificant enough that it would be OK to assign both measurements to the same timestamp).

Is there some well-known way of doing this fuzzy merge operation? A simple public domain function??


Solution

  • Start by asking yourself these questions: Do the arrays have the same number of elements? How do you want to combine two items with the same timestamp? How do you want to combine two items with different timestamp?

    You will probably have to write the algorithm yourself. Something like this would be easy to implement:

    1. Start by sorting each of the arrays individually by the timestamp order.
    2. Declare two iterators at the beginning of each input array respectively, and an empty output array.
    3. Then, you check which of the arrays has the earliest timestamp. Call it EARLY, and the other LATE.
      • If EARLY is close to LATE (by less than some constant), you apply a merge operation and insert the result at the end of the output array. Increment both iterators and go back to 3.
      • Otherwise, EARLY is far from LATE. You need to handle a missing value in the LATE array, perhaps by repeating the previous value or interpolating it using some function. Decide to insert or not a value in the output array. You only increment the EARLY array iterator in this case and go back to 3.
    4. If you have reached the end of either one of the arrays, the rest of the other array is LATE. You may want to interpret this as missing values and also repeat or interpolate the measurements.
    5. Return the output array.

    ~

    OutputArray merge(InputArray& a, InputArray& b) {
        InputArray::iterator a_it = a.begin();
        InputArray::iterator b_it = b.begin();
        while(a_it != a.end() && b_it != b.end()) {
            InputArray::iterator& early = *a_it.timestamp < *b_it.timestamp ? a_it : b_it;
            InputArray::iterator& late = *a_it.timestamp < *b_it.timestamp ? b_it : a_it;
            if(*late.timestamp - *early.timestamp < TIMESTAMP_CLOSE_ENOUGH) {
                output.timestamp = (*late.timestamp + *early.timestamp) / 2; // mean value
                output.measure1 = *a_it.measure;
                output.measure2 = *b_it.measure;
                outputArray.push_back(output);
                a_it++; b_it++;
            }
            else {
                output.timestamp = *early.timestamp;
                output.measure1 = *a_it.timestamp < *b_it.timestamp ? *a_it.measure : outputArray.back.measure1; // previous value if missing
                output.measure2 = *a_it.timestamp < *b_it.timestamp ? outputArray.back.measure2 : *b_it.measure;
                outputArray.push_back(output);
                early++;
            }
        }
    
        InputArray::iterator& late = a_it != a.end() ? a_it : b_it;
        InputArray::iterator late_end = a_it != a.end() ? a.end() : b.end();
        while(late != late_end) {
                output.timestamp = *late.timestamp;
                output.measure1 = a_it != a.end() ? *a_it.measure : outputArray.back.measure1; // previous value if missing
                output.measure2 = a_it != a.end() ? outputArray.back.measure2 : *b_it.measure;
                outputArray.push_back(output);
                late++;
        }
        return outputArray;
    }