Search code examples
pythonpandasstatisticsreturnmedian

Why the statistics module only returning the name of the column, but not the median value?


alcohol file

import pandas as pd
import statistics as st

def median_1(table):
    print(table.median())
    
def median_2(table):
    print(st.median(table))

# Reading the excel file and sorting the value according to the X column
file=pd.read_excel("C:\\Users\\hp\\Desktop\\alcohol.xls").sort_values("X")

#Forming the new index using list comprehension
index_row=[i+1 for i in range(len(file))]

#making the new index compatible
index_new=pd.Index(index_row)

#Extracting the column named X and converting it into dataframe
column_df=pd.DataFrame(file.loc[:,"X"])

#setting the new index 
new=column_df.set_index(index_new)


median_1(new)
median_2(new)

Median_1 is returning column name and the median values, but it should be returning only the median value.

The median_2 function is not returning the median value, it is just returning the name of the column.

Output:
runfile('C:/Users/hp/Desktop/eg.py', wdir='C:/Users/hp/Desktop')
X    562.5
dtype: float64
X

Solution

  • st.median() takes a list not a data frame as input. Since new is a data frame, it does not work. You could specify the column when you pass the parameter.

    median_2(new['X']) 
    # this will give you the median value without the column name
    562.5
    

    The same will also work for df.median() also as in your median_1 function.

    median_1(new['X'])
    # this will also give you the median value without the column name
    562.5