In my dataset data1, I have a column Region, with 3 categories:Asia, Europe, North America. Now I'm trying to fit in a KM model for survival analysis of certain machine parts belonging to these 3 regions. The variable being used is number of hours of operation before the machines collapse. I've used the following code, which runs fine:
T=data1['op_hours']
Region_Asia=(data1['Region'] == 'ASIA')
Region_EUROPE=(data1['Region'] == 'EUROPE')
Region_NORTH=(data1['Region'] == 'NORTH AMERICA')
from lifelines import KaplanMeierFitter
kmf = KaplanMeierFitter()
ax = plt.subplot(111)
kmf.fit(T[Region_Asia], label="Asia")
kmf.plot(ax=ax,ci_force_lines=False)
kmf.fit(T[Region_EUROPE], label="Europe")
kmf.plot(ax=ax, ci_force_lines=False)
kmf.fit(T[Region_NORTH], label="North America")
kmf.plot(ax=ax, ci_force_lines=False)
plt.ylim(0, 1);
plt.title("Lifespans of different machines")
Now, I'm trying to create a function so that I won't have to write separate lines of codes for each of the categories in order to obtain the KM fit. I've tried this:
def Kaplan(c):
a=[]
u=[]
u=c.unique()
T=data1['op_hours']
from lifelines import KaplanMeierFitter
kmf = KaplanMeierFitter()
ax = plt.subplot(111)
for i in range(len(u)):
a=u[i]
kmf.fit(T[a])
kmf.plot(ax=ax,ci_force_lines=False)
plt.ylim(0, 1);
plt.title("Lifespans of different machines")
Kaplan(data1.Region)
I'm getting: KeyError: 'ASIA'
Can somebody help me out here, I'm still a newbie in coding. Thanks a lot.
Based on your given code at the beginning, you can do this
from lifelines import KaplanMeierFitter
def Kaplan(dt, time, regions):
tobefit = lambda region: dt[time][(dt['Region'] == region)]
ax = plt.subplot(111)
kmf = KaplanMeierFitter()
for region in regions:
kmf.fit(tobefit(region), label=region)
kmf.plot(ax=ax,ci_force_lines=False)
plt.ylim(0, 1);
plt.title("Lifespans of different machines")
Kaplan(data1, "op_hours", ["Asia", "Europe", "North America"])
Update
If you have fixed parameters and do not want to type them every time you invoke the function. You can define the function with default parameters
def Kaplan(dt, time="op_hours", regions=["Asia", "Europe", "North America"]):
tobefit = lambda region: dt[time][(dt['Region'] == region)]
ax = plt.subplot(111)
kmf = KaplanMeierFitter()
for region in regions:
kmf.fit(tobefit(region), label=region)
kmf.plot(ax=ax,ci_force_lines=False)
plt.ylim(0, 1);
plt.title("Lifespans of different machines")
# Then you can call your Kaplan function without specifying time and regions
Kaplan(data1)