Search code examples
c#dataframedeedle

How to deal with null (missing) values in a deedle series in C#?


How should I deal with missing values in a deedle series?

For example, I have a series with fields Name and BirthDate, where BirthDate is initially DateTime? and I need to convert BirthDate to String.

var newDOB = df.GetColumn<DateTime?>("DOB").Select(x => x.Value.Value != null ? x.Value.Value.ToString("dd/MM/yyyy") : " ");
df.ReplaceColumn("DOB", newDOB);

This is what I tried and it does not work. What is the best way to convert a missing DateTime? value to string for me? And what is the best way in general to deal with missing values in Deedle series and Deedle dataframes in C#?


Solution

  • When you are creating a Deedle series, Deedle detects invalid values and treats them as missing automatically - so when you create a series with NaN or null, those are automatically turned into missing values (and this also works for nullables).

    Furthermore, the Select method skips over all missing values. For example, consider this series:

    Series<int, DateTime?> ds = Enumerable.Range(0, 100).Select(i => 
      new KeyValuePair<int, DateTime?>(i, i%5==0 ? (DateTime?)null : DateTime.Now.AddHours(i))
     ).ToSeries();
    ds.Print();
    

    Here, Deedle recognizes that every fifth value is missing. When you call Select, it applies the operation only to valid values and every fifth value remains as a missing value:

      ds.Select(kvp => kvp.Value.Value.ToString("D")).Print();
    

    If you want to do something with the missing values, you could use FillMissing (to fill them with a specified string or to copy the value from previous item in the series) or DropMissing to discard them from the series. You can also use SelectOptional that calls your function with OptionalValue<V> and so you can implement your own custom logic for missing values.

    This also means that if you have Series<K, DateTime?>, it is really not very useful, because the null values are all handled by Deedle - so you can turn it into Series<K, DateTime> using Select(kvp => kvp.Value.Value) and let Deedle handle missing values for you.