Search code examples
pythonpandastimedelta

How to manage missing values with timedelta date calculations


The following two columns in a dataframe called 'question' are copied from a lager dataframe

{'Breddat': {0: Timestamp('2010-05-04 00:00:00'),
  1: Timestamp('2011-02-02 00:00:00'),
  2: Timestamp('2010-05-04 00:00:00'),
  3: Timestamp('2011-04-27 00:00:00'),
  4: Timestamp('2012-01-24 00:00:00'),
  5: NaT,
  6: Timestamp('2015-02-06 00:00:00'),
  7: Timestamp('2016-02-04 00:00:00')},
 'Result': {0: 176.0,
  1: 97.0,
  2: 162.0,
  3: 112.0,
  4: 81.0,
  5: nan,
  6: 87.0,
  7: 97.0}}

If I delete the row with the missing 'Result' value then the following code to calculate 'Cdat' works

question['Cdat'] = (question['Breddat']) - question['Result'].map(dt.timedelta)

I cannot delete all the rows in the main dataframe that have missing values (they are required for other things). I have unsuccessfully tried different approaches to exclude rows with missing values from the calculation without success. I am new to python and pandas and appear to be missing something basic.

Appreciate any help calculating conception date (Cdat) when there are rows with missing values.


Solution

  • Use pd.to_timedelta instead of datetime.timestamp to deal with NaN values.

    question['Cdat'] = question['Breddat'] - pd.to_timedelta(question['Result'], unit='D')
    
    >>> question
         Breddat  Result       Cdat
    0 2010-05-04   176.0 2009-11-09
    1 2011-02-02    97.0 2010-10-28
    2 2010-05-04   162.0 2009-11-23
    3 2011-04-27   112.0 2011-01-05
    4 2012-01-24    81.0 2011-11-04
    5        NaT     NaN        NaT
    6 2015-02-06    87.0 2014-11-11
    7 2016-02-04    97.0 2015-10-30