Search code examples
c#algorithmintegerint64int32

Median Maintenance Algorithm - Same implementation yields different results depending on Int32 or Int64


I found something interesting while doing a HW question.

The howework question asks to code the Median Maintenance algorithm.

The formal statement is as follows:

The goal of this problem is to implement the "Median Maintenance" algorithm (covered in the Week 5 lecture on heap applications). The text file contains a list of the integers from 1 to 10000 in unsorted order; you should treat this as a stream of numbers, arriving one by one. Letting xi denote the ith number of the file, the kth median mk is defined as the median of the numbers x1,…,xk. (So, if k is odd, then mk is ((k+1)/2)th smallest number among x1,…,xk; if k is even, then m1 is the (k/2)th smallest number among x1,…,xk.)

In order to get O(n) running time, this should be implemented using heaps obviously. Anyways, I coded this using Brute Force (deadline was too soon and needed an answer right away) (O(n2)) with the following steps:

  1. Read data in
  2. Sort array
  3. Find Median
  4. Add it to running time

I ran the algorithm through several test cases (with a known answer) and got the correct results, however when I was running the same algorithm on a larger data set I was getting the wrong answer. I was doing all the operations using Int64 ro represent the data. Then I tried switching to Int32 and magically I got the correct answer which makes no sense to me.

The code is below, and it is also found here (the data is in the repo). The algorithm starts to give erroneous results after the 3810 index:

    private static void Main(string[] args)
    {
        MedianMaintenance("Question2.txt");
    }

    private static void MedianMaintenance(string filename)
    {
        var txtData = File.ReadLines(filename).ToArray();
        var inputData32 = new List<Int32>();
        var medians32 = new List<Int32>();
        var sums32 = new List<Int32>();
        var inputData64 = new List<Int64>();
        var medians64 = new List<Int64>();
        var sums64 = new List<Int64>();
        var sum = 0;
        var sum64 = 0f;
        var i = 0;
        foreach (var s in txtData)
        {
            //Add to sorted list
            var intToAdd = Convert.ToInt32(s);

            inputData32.Add(intToAdd);
            inputData64.Add(Convert.ToInt64(s));

            //Compute sum
            var count = inputData32.Count;
            inputData32.Sort();
            inputData64.Sort();
            var index = 0;

            if (count%2 == 0)
            {
                //Even number of elements
                index = count/2 - 1;
            }
            else
            {
                //Number is odd
                index = ((count + 1)/2) - 1;
            }
            var val32 = Convert.ToInt32(inputData32[index]);
            var val64 = Convert.ToInt64(inputData64[index]);
            if (i > 3810)
            {
                var t = sum;
                var t1 = sum + val32;
            }
            medians32.Add(val32);
            medians64.Add(val64);
            //Debug.WriteLine("Median is {0}", val);
            sum += val32;
            sums32.Add(Convert.ToInt32(sum));
            sum64 += val64;
            sums64.Add(Convert.ToInt64(sum64));
            i++;
        }
        Console.WriteLine("Median Maintenance result is {0}", (sum).ToString("N"));
        Console.WriteLine("Median Maintenance result is {0}", (medians32.Sum()).ToString("N"));

        Console.WriteLine("Median Maintenance result is {0} - Int64", (sum64).ToString("N"));
        Console.WriteLine("Median Maintenance result is {0} - Int64", (medians64.Sum()).ToString("N"));
    }

What's more interesting is that the running sum (in the sum64 variable) yields a different result than summing all items in the list with LINQ's Sum() function.

The results (the thirs one is the one that's wrong): Console Application Results

These are the computer details: Computer details

I'll appreciate if someone can give me some insights on why is this happening.

Thanks,


Solution

  • 0f is initializing a 32 bit float variable, you meant 0d or 0.0 to receive a 64 bit floating point.

    As for linq, you'll probably get better results if you use strongly typed lists.

    new List<int>()
    new List<long>()