Search code examples
python-3.xmedian

Extending current code to include both median and mode


I have this line of code that i used for one assignment, but i can't figure out how to add the median and mode into the code to let it run without error.

def main():
    filename = input('File name: ')
    num=0
    try:

        infile = open(filename, 'r')
        count = 0
        total = 0.0
        average = 0.0
        maximum = 0
        minimum = 0
        range1 = 0

        for line in infile:
            num = int(line)
            count = count + 1
            total = total + num

            if count == 1:
                maximum = num
                minimum = num
            else:
                if num > maximum:
                    maximum = num
                if num < minimum:
                minimum = num

    if count > 0:
        average = total / count
        range1 = maximum - minimum

Solution

  • I'll jump right in and show you the code. It's a very simple and quite pythonic solution.

    Solution

    import statistics
    
    
    def open_file(filename):
        try:
            return open(filename, 'r')
        except OSError as e:
            print(e)
            return None
    
    
    def main():
        # Read file. Note that we are trusting the user input here without sanitizing.
        fd = open_file(input('File name: '))
    
        if fd is None:  # Ensure we have a file descriptor
            return
    
        data = fd.read()  # Read whole file
        if data == '':
            print("No data in file")
            return
        lines = data.split('\n')  # Split the data into a list of strings
    
        # We need to convert the list of strings to a list of integers
        # I don't know a pythonic way of doing this.
        for number, item in enumerate(lines):
            lines[number] = int(item)
    
        total_lines = len(lines)
        total_sum = sum(lines)
        maximum = max(lines)
        minimum = min(lines)
    
        # Here is the python magic, no need to reinvent the wheel!
        mean = statistics.mean(lines)  # mean == average
        median = statistics.median(lines)
        mode = "No mode!"
        try:
            mode = statistics.mode(lines)
        except statistics.StatisticsError as ec:
            pass  # No mode, due to having the same quantity of 2 or more different values 
    
        print("Total lines: " + str(total_lines))
        print("Sum: " + str(total_sum))
        print("Max: " + str(maximum))
        print("Min: " + str(minimum))
        print("Mean: " + str(mean))
        print("Median: " + str(median))
        print("Mode: " + str(mode))
    
    
    if __name__ == '__main__':
        main()
    

    Explanation

    Generally, in python, it's safe to assume that if you want to calculate any mundane value using a well known algorithm, there will already be a function written for you to do just that. No need to reinvent the wheel!

    These functions aren't usually hard to find online either. For instance, you can find suggestions regarding the statistics library by googling python calculate the median

    Although you have the solution, I strongly advise looking through the source code of the statistics library (posted below), and working out how these functions work for yourself. It will help your grow as a developer and mathematician.

    statistics.py

    mean

    def mean(data):
        """Return the sample arithmetic mean of data.
    
        >>> mean([1, 2, 3, 4, 4])
        2.8
    
        >>> from fractions import Fraction as F
        >>> mean([F(3, 7), F(1, 21), F(5, 3), F(1, 3)])
        Fraction(13, 21)
    
        >>> from decimal import Decimal as D
        >>> mean([D("0.5"), D("0.75"), D("0.625"), D("0.375")])
        Decimal('0.5625')
    
        If ``data`` is empty, StatisticsError will be raised.
        """
        if iter(data) is data:
            data = list(data)
        n = len(data)
        if n < 1:
            raise StatisticsError('mean requires at least one data point')
        T, total, count = _sum(data)
        assert count == n
        return _convert(total/n, T)
    

    median

    def median(data):
        """Return the median (middle value) of numeric data.
    
        When the number of data points is odd, return the middle data point.
        When the number of data points is even, the median is interpolated by
        taking the average of the two middle values:
    
        >>> median([1, 3, 5])
        3
        >>> median([1, 3, 5, 7])
        4.0
    
        """
        data = sorted(data)
        n = len(data)
        if n == 0:
            raise StatisticsError("no median for empty data")
        if n%2 == 1:
            return data[n//2]
        else:
            i = n//2
            return (data[i - 1] + data[i])/2
    

    mode

    def mode(data):
        """Return the most common data point from discrete or nominal data.
    
        ``mode`` assumes discrete data, and returns a single value. This is the
        standard treatment of the mode as commonly taught in schools:
    
        >>> mode([1, 1, 2, 3, 3, 3, 3, 4])
        3
    
        This also works with nominal (non-numeric) data:
    
        >>> mode(["red", "blue", "blue", "red", "green", "red", "red"])
        'red'
    
        If there is not exactly one most common value, ``mode`` will raise
        StatisticsError.
        """
        # Generate a table of sorted (value, frequency) pairs.
        table = _counts(data)
        if len(table) == 1:
            return table[0][0]
        elif table:
            raise StatisticsError(
                    'no unique mode; found %d equally common values' % len(table)
                    )
        else:
            raise StatisticsError('no mode for empty data')