Search code examples
pythoncox-regression

Extract the data summary from cox.print_summary()


I want the first 6 lines of print_summary output. How do I do that?

I have the entire summary from cox.print_summary(). cox.summary() gives the column details I a data frame format, but indexing the summary does not give the dataset censor summary

cph = CoxPHFitter()
cph.fit(self.data_train, duration_col='time', event_col='dead')
cph.print_summary()
'''<lifelines.CoxPHFitter: fitted with 6373 observations, 1974 censored>
      duration col = 'time'
         event col = 'dead'
number of subjects = 6373
  number of events = 4399
    log-likelihood = -34779.52
  time fit was run = 2019-05-09 06:28:06 UTC

---
                    coef  exp(coef)  se(coef)     z      p  -log2(p)  lower 0.95  upper 0.95
dzgroupCHF          0.49       1.64      0.06  8.19 <0.005     51.79        0.37        0.61
dzgroupCirrhosis    0.55       1.73      0.08  6.71 <0.005     35.63        0.39        0.71

and so on

results = self.cph.summary
print(results.head())

This gives the variable details in a df format. But I want :

'''<lifelines.CoxPHFitter: fitted with 6373 observations, 1974 censored>
      duration col = 'time'
         event col = 'dead'
number of subjects = 6373
  number of events = 4399
    log-likelihood = -34779.52
  time fit was run = 2019-05-09 06:28:06 UTC

Indexing gives the error:

cph.print_summary()[0:9]

TypeError: 'NoneType' object is not subscriptable


Solution

  • most of these are properties on the model that can be accessed directly. Looking at the code, the print_summary looks like:

            print(self)
            print("{} = '{}'".format(justify("duration col"), self.duration_col))
    
            if self.event_col:
                print("{} = '{}'".format(justify("event col"), self.event_col))
            if self.weights_col:
                print("{} = '{}'".format(justify("weights col"), self.weights_col))
    
            if self.cluster_col:
                print("{} = '{}'".format(justify("cluster col"), self.cluster_col))
    
            if self.robust or self.cluster_col:
                print("{} = {}".format(justify("robust variance"), True))
    
            if self.strata:
                print("{} = {}".format(justify("strata"), self.strata))
    
            if self.penalizer > 0:
                print("{} = {}".format(justify("penalizer"), self.penalizer))
    
            print("{} = {}".format(justify("number of subjects"), self._n_examples))
            print("{} = {}".format(justify("number of events"), self.event_observed.sum()))
            print("{} = {:.{prec}f}".format(justify("partial log-likelihood"), self._log_likelihood, prec=decimals))
            print("{} = {}".format(justify("time fit was run"), self._time_fit_was_called))
    

    So one could access the desired values with self._log_likelihood, or self._n_examples, etc.

    There's some future work being done that may make extracting this data easier: https://github.com/CamDavidsonPilon/lifelines/issues/721#issuecomment-497180538