Search code examples
javamatlabmatrixvectorejml

How can I find the stop and start index for a Java vector?


I have a vector that looks like this:

y =

 Columns 1 through 19:

   1   1   1   1   1   1   1   1   1   1   1   1   2   2   2   2   2   2   2

 Columns 20 through 38:

   2   2   2   2   3   3   3   3   3   3   3   3   3   3   3   4   4   4   4

 Columns 39 through 57:

   4   4   4   4   4   4   4   5   5   5   5   5   5   5   5   5   5   5   6

 Columns 58 through 67:

   6   6   6   6   6   6   6   6   6   6

The vector y is always start at 1 and be counted up. You see that there are lots of same numbers. It's the classes for the samples.

Here we have 1 1 1 1 1 1 1 1 1 1 1 1 = 12 samples for class number 1.

We have 2 2 2 2 2 2 2 2 2 2 2 = 11 samples for class number 2.

My problem here is that I want to find start and stop for every class. For example: Class 1 begins always at index 0 and ends, in this case, at index 11.

Class 2 begins directly after class 1 ends.

Question:

I'm using EJML (Effient Java Matrix Library) and I'm planning to use this function:

C = A.extractMatrix(1,4,2,8) 

Which is equal to this MATLAB code:

C = A(2:4,3:8) 

But I need to find the start and stop indexes from this y vector. In what index does e.g class 3 stops and starts? Do you have any smart ideas how to do that?

Sure, I could use a for-loop, to do this, but for-loops in Java is quite slow because I'm going to have a very very large y vector.

Suggestions?

Edit:

Here is an suggestion. Is that good, or could it be done better?

private void startStopIndex(SimpleMatrix y, int c, Integer[] startStop) {
    int column = y.numCols();
    startStop[0] = startStop[1] + 1; // Begin at the next class
    for(int i = startStop[0]; i < column; i++) {
        if(y.get(i) != c) {
            break;
        }else {
            startStop[1] = i;
        }
    }

}

Assuming that we are calling the method from:

        Integer[] startStop = new Integer[2];
        for(int i = 0; i < c; i++) {
            startStopIndex(y, c, startStop);
        }

Solution

  • If you want to do it faster then binary search is your friend. Threw this together really quick and it does things in O(log n) time, where as a linear search does it in O(n). It's pretty basic and assumes your data looks pretty much like you describe it. Feed it weird data and it will break.:

    int[] breakPoints(int[] arr, int low, int high){
        int[] rtrn = new int[high];
        for(int i=low;i<high;i++){
            rtrn[i]=binarySearch(arr, i, 0, arr.length-1);
        }
        return rtrn;
    }
    
    int binarySearch(int[] arr, int k, int start, int end){
        int mid = (start+end)/2;
        if(mid==arr.length){
            return -1;
        }
        if(arr[mid]==k && arr[mid+1]==k+1){
            return mid+1; //or just mid if you want before breakpoint
        }
        if(arr[mid]<=k){
            return binarySearch(arr, k, mid+1, end);
        }
        return binarySearch(arr, k, start, mid-1);
    }
    

    You'd call it like this:

    int[] data = {1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,5,5,6,6,6,6};
    int[] bp = breakPoints(data,1,6);
    //return 0, 3, 8, 13, 16, 18