Search code examples
c++vectoreigen

How can I change duplicate elements in a vector of ints so no values are repeated while also maintaining the number of elements and monotonicity?


I have code that generates a distribution of N floating points from 0 to 1 based on a parameterized equation. I need them as 8 bit integer values so after that I scale them up to 255 and round them to the nearest int. I also need them to be unique with no repeated values. It's fairly trivial to test for duplicates and remove them, however, I need to retain the original number size of N distribution points. In some cases I may already have a unique set in which case, no action is needed:

0 3 15 40 78 128 177 215 240 252 255 -> No Op

But sometimes I may end up with something like:

0 0 0 2 21 128 234 253 255 255 255

In that case, what I would like to end up with is a set that looks like this:

0 1 2 3 21 128 234 252 253 254 255

I'm adjusting each repeated value by the minimum needed to make it unique while also maintain a monotonic order as well as the original number of points.

So, from left to right, what I need to do is increment the first repeat value by 1 and so on. But note that the 4th element is 2 so I also need to account for the possibility of creating a duplicate while incrementing other values.

But then on the right hand side, 255 is my max possible value so I need those to step down by 1 going left.

I'm currently using Eigen as the Vector container but I can use anything in STL.

Other complications are that I can't know ahead of time the number of original points, N, which can be any positive integer from from 2 to 255.

Another possibly relevant and useful detail might be that my original distribution set of doubles from 0 to 1 is guaranteed to be unique and monotonically increasing. I don't know how that can be leveraged but it's perfectly acceptable to attempt to account repeats before scaling to 255 if there is a better solution.

Here is the code that currently generates the distribution set of doubles and then scales it to ints:

Eigen::VectorXi v_i(NUMBER_OF_POINTS);  // NUMBER_OF_POINTS: int from 2 to 255
Eigen::VectorXd v_d(NUMBER_OF_POINTS);
double d;

for ( int i = 1; i < v_d.size() - 1; ++i )
    {
        d = i / ( v_d.size() - 1.0 );
        v( i ) = 1.0 / ( 1.0 + pow( d / ( 1.0 - d ), -SLOPE ) );  // SLOPE: double > 0
    }

v_d( 0 ) = 0;  // Manually setting the endpoints to 0 and 1 to avoid divide by zero error 

v_d( v_d.size() - 1 ) = 1.0;

for ( int i = 0; i < v_i.size(); ++i )
{
    v_i(i) = round( v_d( i ) * 255 );
}

std::cout << v_i << std::endl;

Thanks in advance for the help.


Solution

  • The simplest way to approach this is to do two passes over the array, assuming it is sorted to begin with:

    • forward pass, modifies A[n] = A[n-1] + 1 when A[n] <= A[n-1] and clamps to 255
    • reverse pass, modifies A[n] = A[n+1] - 1 when A[n] >= A[n+1] and (optionally) clamps to 0

    Provided your array length is 256 or less, this is guaranteed to make all elements unique.

    It is not necessarily optimal, nor will it guarantee that adjusted values are as close to their original value as possible, but that doesn't appear to be one of your requirements.

    Anything more clever than this is likely to involve a significant amount of effort.