Search code examples
pythonpandasdataframelifelines

Getting survival function estimates group by attribute level in Lifelines


I have a challenge with using Lifelines for KM estimates. I have a variable column called worker type (Full Time, Part Time, etc) that I would like to group the KM estimates for, then output to a CSV file. Here's a snippet:

worker_types = df['Emp_Status'].unique() 

for i, worker_type in enumerate(worker_types): 
    ix = df['Emp_Status'] == worker_type 
    kmf.fit(T[ix], C[ix]) 
    kmf.survival_function_['worker'] = worker_type 
    #print kmf.survival_function_ 
    kmf.surviva

l_function_.to_csv('C:\Users\Downloads\test.csv')

When I use the print function, I get each iteration of the KM estimate per worker_type; however, when trying to export to a csv file, I only get the last estimate of worker type.

I've read the lifelines docs, and seen the examples for the plotting of different levels, but not sure how to bridge that to exporting to csv.


Solution

  • You can open the file in append mode at the top of the loop and then append each row, e.g.:

    worker_types = df['Emp_Status'].unique() 
    with open('C:/Users/Downloads/test.csv', 'a') as fou:
        for i, worker_type in enumerate(worker_types): 
            ix = df['Emp_Status'] == worker_type 
            kmf.fit(T[ix], C[ix]) 
            kmf.survival_function_['worker'] = worker_type 
            if i == 0:
                kmf.survival_function_.to_csv(fou) # write header on first iteration
            else:
                kmf.survival_function_.to_csv(fou, header=False)
    

    Side note: Please do not use backwards slashes for Windows paths within Python. Instead use forward slashes.