Search code examples
c#arraysoverflowdifferencearray-difference

TapeEquilibrium task from Codility, failing to pass 100% tests, performance and correctness (76% total score)


Sorry if I wrote too much. I tried to be very detailed so I could find my own answer while describing the problem (the solution, and how I got there), and yet ,if I didn't, not leaving doubts to the reader. I don't want the problem to be solved by anyone else, just get a few pointers and where should I be looking to improve my solution, if it is possible. I am open to discard the approach if it isn't the right one, but I am also fancy for making this approach pass 100%, if it is possible, and if it makes sense and is suitable in an effort-quality relationship. If my idea is bad, I don't mind being told and follow another one, but again, please don't show me code or the "right answer", I can google it myself or find it here in SO.
The exercise goes like this:

A non-empty array A consisting of N integers is given. Array A represents numbers on a tape.

Any integer P, such that 0 < P < N, splits this tape into two non-empty parts:

 A[0], A[1], ..., A[P − 1] and A[P], A[P + 1], ..., A[N − 1].

The difference between the two parts is the value of:

|(A[0] + A[1] + ... + A[P − 1]) − (A[P] + A[P + 1] + ... + A[N − 1])|

In other words, it is the absolute difference between the sum of the first part and the sum of the second part.

For example, consider array A such that:

A[0] = 3 A[1] = 1 A[2] = 2 A[3] = 4 A[4] = 3

We can split this tape in four places:

P = 1, difference = |3 − 10| = 7
P = 2, difference = |4 − 9| = 5
P = 3, difference = |6 − 7| = 1
P = 4, difference = |10 − 3| = 7

Write a function:

class Solution { public int solution(int[] A); }

that, given a non-empty array A of N integers, returns the minimal difference that can be achieved.

For example, given:

A[0] = 3 A[1] = 1 A[2] = 2 A[3] = 4 A[4] = 3

the function should return 1, as explained above.

Write an efficient algorithm for the following assumptions:

N is an integer within the range [2..100,000]; each element of array A is an integer within the range [−1,000..1,000].


As stated in the title, my solution is not that great (66% in performance; 85% in correctness; 76% as a total score).
This is the approach I used yesterday:

accumulate from the left and the right side of the array received, and check which side would be smaller after adding the next one to the right of the left side, or the nearest to the left of the right side of the array.
Graphically, the idea was sort of making two trains: first the locomotives (head and tail of the array, respectively) of each side, and then add to each locomotive the train wagons (if any, and if it corresponded to), and make both trains crash. I was sort of "adding wagons to a train, only if the other train was bigger than the one I added a wagon to last".
The code I wrote didn't work that well, because I needed to find the smallest difference I could between the two partitions of the array.


but I got several errors from it: with negative numbers mixed with non-negative ones, two elements arrays, etc, and even worse results, (like 65% total score).
The idea is similar to the one I followed today:

Accumulate from the left and the right side, and check if the difference between the left and the right trains was smaller after adding a wagon to the left, or to the right. In this approach I concentrated in the "difference between the two trains, and not in adding a wagon if the other one was bigger and find an equilibrium".

Anyways, I got errors in:

  • small_random: random small, length = 100: WRONG ANSWER, got 269 expected 39
  • large_ones: large sequence, numbers from -1 to 1, length = ~100,000: WRONG ANSWER, got 228 expected 0; WRONG ANSWER, got 147 expected 1. (possible overflow?)
  • large_random random large, length = ~100,000: WRONG ANSWER, got 202635 expected 1; WRONG ANSWER, got 34394 expected 2

All the other tests were fine. This is the code I wrote:

using System;

class Solution {
    public int solution(int[] A)
    {
        int N = A.Length, P;
        if (N == 0)
        {
             P = 0;
        }
        else
        {
            P = findTapeEquilibrium(A, N);
        }
        return P;
    }

    private static int findTapeEquilibrium(int[] A, int N)
    {
        int I = 0, J = N - 1;
        if( N > 0 )
        {
            int leftP = A[I++], rightP = A[J--];
            bool AIsPair = N - 1 % 2 == 0;
            while((!AIsPair &&  J - I >= 0) || (AIsPair && J - I >= 1))
            {
                if (Math.Abs((leftP+A[I])-rightP) <= Math.Abs(leftP-(rightP+A[J])))
                {
                    leftP += A[I++];
                }
                else
                {
                    rightP += A[J--];
                }
            }
            return Math.Abs(leftP - rightP);
        }
        return 0;
    }
}

Solution

  • So, I ended up looking at the solution of yusheng123, which is in python, but his code is so easy to understand.
    Now I realize I was doing things so badly: I was trying to debug or fix a solution that was poorly thought.
    The solution by this guy, was somewhat similar to the one I followed at last, but with huge differences.
    I was comparing the first and the last element of the array, and then adding numbers to the head, or the tail (accumulators), depending on whether the difference between those two would be smaller, was awful: too complicated, too focused on the structure received, and most importantly, not abstracting myself from the array itself (the partitions!) to find a proper way to solve the problem: the fact that you have to find those partitions, doesn't mean you have to actually kinda create your own two arrays, even though I was accumulating from both ends, I now see it is not decoupled at all from the problem itself, I was not creating the arrays per se, but the procedure to accomplish that, would be the same followed by my moronic approach.
    If, in contrast, I emphasize in the difference there is between the head (being only the first element), and the tail (being the sum of all the other elements of the array), then I'm in a higher abstraction level, and the fact that I can just iterate through the array from beginning to end, and get to the solution, says a lot in juxtaposition.
    So, in the future, when I will see myself doing something similar to what I did, I'll say to me: "How can you follow a simpler plan (a simpler iteration in this particular case) and still content yourself by following the idea you thought of but not pushing it too hard (as I did in this case)? Don't throw your idea to the garbage yet, discard your complicated code though, make it simpler, and try to isolate the, maybe plausible idea from the implementation for now, until you find something that clicks in your mind, since many rules, and questions have to be asked to accomplish what you are trying to." Excuse me if I use this answer to speak to my future self and not talk about the code itself entirely.
    Continuing with the brilliant solution I found online, the strategy (of setting the head as the first element of the array, and the tail as the sum of the rest of the array), would continue by setting as the initial value of the smallest difference, to the one there is between the head and the sum of the rest of the elements, and then you just iterate through the array, and add the current element to the head accumulator, and decrease that value from the rest (tail accumulator), while checking that if the new difference between those two is smaller than the one saved in the beginning, then it should be override by the new value (which is what we wanted to find in the first place, this is the value that the function will return at the end -or at the beginning if there's 0, 1 or 2 elements in the array-).
    There are no wagons adding to locomotives (I see now how stupid I must sounded in your own voices, subvocalizing what I wrote). There is no need for such silly analogies in this simple problem, with such an easy, and elegant solution. Right there: I will be able to spot that I'm doing something wrong in the future in a similar situation.
    There is so much order in this approach. A higher level of abstraction. If there were bugs they would be easy to find, and so on. This is "my" final code.

    public int solution(int[] A)
    {
        int head = A[0], tail = A.Sum() - head;
        int min_diff = Math.Abs(head - tail);
        for (int i = 1; i < A.Length; i++)
        {
            if (Math.Abs(head - tail) < min_diff)
                min_diff = Math.Abs(head - tail);
            head += A[i];
            tail -= A[i];
        }
        return min_diff;
    }
    

    Of course, the final score of this solution is 100%. Just so beautiful and simple.