The following is a programming task.
You are given a sequence of N integers. The task is to find the number of continuous sequences of integers such that their sum is zero.
For example if the sequence is: 2, -2, 6, -6, 8 There are 3 such sequences:
I already have the following program written in PHP that reads the input from STDIN
(first line containing the number of integers that follow.)
<?php
$n = fgets(STDIN) * 1;
$seq = array();
for ($i = 0; $i < $n; $i++) {
$seq[] = fgets( STDIN ) * 1;
}
$count = 0;
for( $i = 0; $i < $n; $i++)
{
$number = 0;
for( $j = $i; $j < $n; $j++)
{
$number += $seq[$j];
if( $number == 0 )
$count++;
}
}
echo 'count: ' . $count . PHP_EOL;
Input example
5
2
-2
6
-6
8
This works well for smaller sequences, but its efficiency is O(n^2).
What algorithm is appropriate - with possibly O(n) efficiency - for a sequence containing 100.000 integers?
Let's assume your data is stored in an array, and let it be arr
.
Create an array sum
, such that:
sum[i] = arr[0] + arr[1] + ... + arr[i]
And, in addition a single entry at the beginning with 0 (to handle a subarray that starts at the beginning and sums to zero)
Now, it is easy to see that for each two indices i,j
such that i<j
and sum[i]=sum[j]
, the continuous sequences arr[i+1]+arr[i+2]+...+arr[j] = 0
.
By creating this array sum
, you only have left to find how many duplicates are there. This cannot be done in O(n)
1 (this is the element distinctness problem), but can be solved in O(nlogn)
using sorting and then iterating and counting, which is still very fast for 100,000 entries.
Note, that if there are for example n
duplicates of the number k
in the array sum
, there are Choose(n,2) = n(n-1)/2
continuous subsequences that are generated for these duplicates.
Example:
arr = [1,2,-2,5,6,-6,-5,8]
sum = [0,1,3,1,6,12,6,1,9]
sorted(sum) = [0,1,1,1,3,6,6,9,12]
There are 3 duplicates of 1 and 2 duplicates of 6, so you have total of:
Choose(3,2) + Choose(2,2) = 3*2/2 + 2/2 = 3+1 = 4
Which indeed match the 4 subsequences:
2,-2
2,-2,5,6,-6,-5
6,-6
5,6,-6,-5
(1) Without hashing, and then you will decay to O(n^2) worst case, but will benefit from O(n)
average case, at the cost of O(n)
extra space.