Search code examples
javaalgorithmmath

group date ranges into buckets


I have a problem where I have a date range and I want to split the date range into chunks (quantity provided by user). Each chunk must be a continuous range of whole months. The longest chunk must be equal to or one month longer than the shortest.

Date ranges are whole months, as well:

  • start dates always are the first of the month
  • end dates will always be the last day of the month.

You can assume that The input range will be large enough that each chunk can have at least one whole month.

For example, consider the trivial case of a date range 1/1/2000 through 8/31/2000, 8 chunks requested. Then each chunk would get a full month.

An easier way to think of this problem is as follows Consider a list of numbers from 1-15 and we want to split them into 8 chunks possible combinations are

(1),(2),(3),(4),(5),(6),(7),(8,9,10,11,12,13,14,15) -> satisfies only one constraints of using up all the chunks
(1,9),(2,10), (3,11), (4,12), (5,13), (6,14), (7,15), (8) ---> satisfies only 1 constraint of minimizing the difference between maximum number and minimum numbers in a chunk.

(1,2), (3,4), (5,6), (7,8) (9,10), (11,12) (13,14), 15  ---> correct

I have considered joda time as the date library.

This is not a homework problem. I am trying to parallelize a query which takes date ranges as an input. The chunks are meant to be cores, and I want to run the query for subsequent date ranges across a core.


Solution

  • This isn't a hard problem. There's a simple construction to do the job. First of all, note that you don't care about days, merely full months.

    • Compute the number of months in the range. Forget the days, just use month and year. Call those inputs m1/d1/y1 (start) and m2/d2/y2 (end). m_range = 12*(y2-y1) + m2-m1 + 1.
    • Call the input chunk count chunk. If m_range < chunk, you have invalid input.
    • The minimum chunk size is min_size = floor(m_range/chunk) (truncated division). The max size is one more.
    • If the division isn't even, then you'll have some leftover months to allocate. Compute this extra as extra = m_range mod chunk.

    With these parameters, the allocation is simple: the first extra chunks get an extra month each, and will be of size min_size+1. The remaining chunk-extra chunks get min_size months each.

    For instance, let's take a range 1/1/2017 - 5/31/2018, and 4 chunks.

    m_range = 12*(2018-2017) + (5-1) + 1 = 12 + 4 + 1 = 17
    min_size = floor(m_range/chunk) = floor(17/4) = 4
    extra = m_range mod chunk = 1
    

    So, you give the first chunk 5 months; the other 3 chunks get 4 months each.

    The individual date manipulations are left as an exercise for the student. :-)