Search code examples
calgorithmdynamic-programmingtrace

How is mod mysteriously useful in this DP problem, and what is the underlying algorithm?


The DP question is as follows: Given a list of non-negative numbers and a target integer k, write a function to check if the array has a continuous subarray of size at least 2 that sums up to a multiple of k, that is, sums up to n*k where n is also an integer.

Example 1:

Input: [23, 2, 4, 6, 7], k=6 Output: True Explanation: Because [2, 4] is a continuous subarray of size 2 and sums up to 6.

Example 2:

Input: [23, 2, 6, 4, 7], k=6 Output: True Explanation: Because [23, 2, 6, 4, 7] is an continuous subarray of size 5 and sums up to 42.

I've spent quite some time understanding the following solution to this problem:

typedef struct e_s {
    int mod;
    int idx;
    struct e_s *shadow;
} e_t;
#define SZ 1024
e_t *lookup(e_t **set, int mod) {
    e_t *e = set[mod % SZ];
    while (e && e->mod != mod) {
        e = e->shadow;
    }
    return e;
}
void put(e_t **set, e_t *e, int mod, int idx) {
    e->mod = mod;
    e->idx = idx;
    e->shadow = set[mod % SZ];
    set[mod % SZ] = e;
}
bool checkSubarraySum(int* nums, int numsSize, int k) {
    int i, s, found = 0;

    e_t buff[10000];
    int n;

    e_t *set[SZ] = { 0 }, *e;

    put(set, &buff[n ++], 0, -1);

    s = 0;
    for (i = 0; i < numsSize; i ++) {
        s += nums[i];
        if (k) s = s % k;
        e = lookup(set, s);
        if (e) {
            if (i - e->idx >= 2) {
                found = 1;
                break;
            }
        } else {
            put(set, &buff[n ++], s, i);
        }
    }

    return found;
}

I even traced an example here which took quite a while, but I can't think of why mod values are important in this question, or even if I traced correctly.

Example:
Input = 23, 2, 6, 4, 7
k = 6

buff = [e_t, e_t, ...] (size = 10000)
SZ = 1024
set = [NULL, NULL, ...] (size = 1024)

put(set, &buff[0], 0, -1);
Now n = 1

buff = [(0, -1, NULL), e_t, e_t...]
set = [(0, -1, NULL), NULL, NULL...]

i = 0
s = 0
s = 23
s = 5
e = NULL
put(set, &buff[1], 5, 0);
Now n = 2

buff = [(0, -1, NULL), (5, 0, NULL), e_t...]
set = [(0, -1, NULL), NULL, NULL, NULL, NULL, (5, 0, NULL), ...]

i = 1
s = 7
s = 1
e = NULL
put(set, &buff[2], 1, 1);
Now n = 3

buff = [(0, -1, NULL), (5, 0, NULL), (1, 1, NULL), e_t...]
set = [(0, -1, NULL), (1, 1, NULL), NULL, NULL, NULL, (5, 0, NULL), ...]

i = 2
s = 7
s = 1
e = (1, 1, NULL)

i = 3
s = 5
e = (5, 0, NULL)

Since i = 3, and e->idx = 0, (i - e->idx >= 2) evaluates to True. The function returns true. Why though...?

At the end of the day, my question comes down to what kind of algorithm is being used here? What's up with the linked lists? And did I trace this correctly?


Solution

  • The algorithm runs through the array starting at index 0. It maintains a sum modulo k of the array elements seen so far. This sum is s. Once s is updated for the current element, the value s is looked up in a hash-table, in order to get the index at which the sum A[0]+...+A[i] % k is equal to s. If this index is found and the distance to the current position is greater or equal to 2, the algorithm returns. If there was no index the current index is stored in the hash-table for the value s. The idea is that if (A[0]+...+A[i] % k) = (A[0]+...+A[i]+...+A[j] %k) then A[i+1]+...+A[j] % k = 0.

    There are two modulos in the code, one is modulo k and derives from the fact that a multiple of k is equal to 0 % k. The second modulo is a modulo SZ and is used to address the hash-table (which is called a set here). This hash-table uses linked-list (using the shadow field) to resolve conflicts.

    The memory for the hash-table nodes is statically allocated in the buff array.

    The value of SZ was arbitrary chosen by the author to ensure an average conflict rate.

    Given the maximal value of k we could completely get rid of the hash-table and nodes and just use an array of size 10000 to store the index of each mod.