Search code examples
pythonfunctionloopssurvival-analysis

Function to run KaplanMeier model(Python)


In my dataset data1, I have a column Region, with 3 categories:Asia, Europe, North America. Now I'm trying to fit in a KM model for survival analysis of certain machine parts belonging to these 3 regions. The variable being used is number of hours of operation before the machines collapse. I've used the following code, which runs fine:

T=data1['op_hours']
Region_Asia=(data1['Region'] == 'ASIA')
Region_EUROPE=(data1['Region'] == 'EUROPE')
Region_NORTH=(data1['Region'] == 'NORTH AMERICA')
from lifelines import KaplanMeierFitter
kmf = KaplanMeierFitter()
ax = plt.subplot(111)
kmf.fit(T[Region_Asia], label="Asia")
kmf.plot(ax=ax,ci_force_lines=False)
kmf.fit(T[Region_EUROPE], label="Europe")
kmf.plot(ax=ax, ci_force_lines=False)
kmf.fit(T[Region_NORTH], label="North America")
kmf.plot(ax=ax, ci_force_lines=False)
plt.ylim(0, 1);
plt.title("Lifespans of different machines")

I get the following plot: enter image description here

Now, I'm trying to create a function so that I won't have to write separate lines of codes for each of the categories in order to obtain the KM fit. I've tried this:

def Kaplan(c):
    a=[]
    u=[]
    u=c.unique()
    T=data1['op_hours']
    from lifelines import KaplanMeierFitter
    kmf = KaplanMeierFitter()
    ax = plt.subplot(111)
    for i in range(len(u)):
        a=u[i]
        kmf.fit(T[a])
        kmf.plot(ax=ax,ci_force_lines=False)
        plt.ylim(0, 1);
        plt.title("Lifespans of different machines")

Kaplan(data1.Region)

I'm getting: KeyError: 'ASIA' Can somebody help me out here, I'm still a newbie in coding. Thanks a lot.


Solution

  • Based on your given code at the beginning, you can do this

    from lifelines import KaplanMeierFitter
    
    def Kaplan(dt, time, regions):
        tobefit = lambda region: dt[time][(dt['Region'] == region)]
        ax = plt.subplot(111)
        kmf = KaplanMeierFitter()
        for region in regions:
            kmf.fit(tobefit(region), label=region)
            kmf.plot(ax=ax,ci_force_lines=False)
        plt.ylim(0, 1);
        plt.title("Lifespans of different machines")
    
    Kaplan(data1, "op_hours", ["Asia", "Europe", "North America"])
    

    Update

    If you have fixed parameters and do not want to type them every time you invoke the function. You can define the function with default parameters

    def Kaplan(dt, time="op_hours", regions=["Asia", "Europe", "North America"]):
        tobefit = lambda region: dt[time][(dt['Region'] == region)]
        ax = plt.subplot(111)
        kmf = KaplanMeierFitter()
        for region in regions:
            kmf.fit(tobefit(region), label=region)
            kmf.plot(ax=ax,ci_force_lines=False)
        plt.ylim(0, 1);
        plt.title("Lifespans of different machines")
    
    # Then you can call your Kaplan function without specifying time and regions  
    Kaplan(data1)