I have a dataframe that is similar to:
I would like to calculate the median age for each city but given that it is a frequency table I'm finding it somewhat tricky. Is there any function in pandas or other that would help me achieve this?
Maybe this works for you:
import numpy as np
import pandas as pd
# create dataframe
df = pd.DataFrame(
[
['Alabama', 34, 67, 89, 89, 67, 545, 4546, 3, 23],
['Georgia', 345, 65, 67, 32, 23, 567, 87, 647, 68]
],
columns=['City', 0, 1, 2, 3, 4, 5, 6, 7, 8]
).set_index('City')
print(df)
# calculate median for freq table
m = list() # median list
for index, row in df.iterrows():
v = list() # value list
z = zip(row.index, row.values)
for item in z:
for f in range(item[1]):
v.append(item[0])
m.append(np.median(v))
df_m = pd.DataFrame({'City': df.index, 'Median': m})
print(df_m)
Input:
0 1 2 3 4 5 6 7 8
City
Alabama 34 67 89 89 67 545 4546 3 23
Georgia 345 65 67 32 23 567 87 647 68
Output:
City Median
0 Alabama 6.0
1 Georgia 5.0