Search code examples
javaloopspartitioning

Divide dataset into n partition


Suppose I have an array a. I want to divide it in n partition. How can I perform the for function in Java? I tried this code but in some cases it is wrong.

public static void main(String[] args) {
        int[] a = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
        int[] part = {0, 1, 2, 3};
        for (int i = 0; i < part.length; i++) {
            for (int j = ((a.length / part.length) * part[i]); j < ((a.length / part.length) * (part[i] + 1)); j++) {
                System.out.print(a[j] + " ");
            }
            System.out.println();
        }
    }
}

Output:

1 2 
3 4 
5 6 
7 8

Data 9 and 10 are missing. I do not need an equal size, but at least all data is well distributed. How to modify the for j function?


Solution

  • Here's one possible solution (very naive approach, not benchmarked or optimised for performance):

    int numOfPartitions = 4; 
    double n = (double)a.length/numOfPartitions;
    int start =0;
    int partitionNum=1;
    while(partitionNum<= numOfPartitions)
      {        
        int end = (int)java.lang.Math.ceil(n*partitionNum);
        for(int k = start;k<end;k++)
          {
            System.out.print(a[k] + " ");
          }
        start = end;
        ++partitionNum;
        System.out.println();
      }
    

    First, see what would be the size of each partition (even if it is not integral). We will use it to slice up the array into sub-arrays.

    Now, start slicing into sub-arrays from the first element of the input array. We use a strategy of taking the largest integer closest to (computed partition size * current partition number) as the upper bound. In this example, we have size = 10/4 = 2.5, so our indices to break into sub-arrays will be 3 (for 2.5), 5, 8 (for 7.5) and 10.

    The first sub-array will take the elements from index 0 of input until index 2, since the first computed index is 3. The next sub-array starts at index 3 and goes till index 5, and so on.

    This way, we end up with the sub-arrays having number of elements differing at most by 1 from each other.

    Note: If we have more partitions than number of elements in input, we can just return the entire array.