Search code examples
pythonliststatistics

Calculating variance accurately in Python


I am calculating variance of list A which contains many sublists. However, the output is different from a mathematical calculation. Why is that? I present the current and expected output.

import statistics

A = [[2], [7], [3], [12], [9]]

# Flatten the sublists to get a single list of values
flattened_values = [value for sublist in A for value in sublist]

# Calculate the variance using the statistics module
variance_value = statistics.variance(flattened_values)

print("Variance:", variance_value)

The current output is

Variance: 17.3

The expected output is

13.84

Solution

  • The statistics.variance() method calculates the variance from a sample of data (from a population) To calculate the variance of an entire population, look at the statistics.pvariance() method

    import statistics
    
    A = [[2], [7], [3], [12], [9]]
    
    # Flatten the sublists to get a single list of values
    flattened_values = [value for sublist in A for value in sublist]
    
    # Calculate the variance using the statistics module
    variance_value = statistics.pvariance(flattened_values)
    
    print("Variance:", variance_value)