Search code examples
mongodbaggregation-frameworkmissing-datalinear-interpolation

How to fill missing field values with linear interpolation?


I have a collection of documents representing values at specific dates.

Some of those dates don't have a value (the field can be missing or set to null).

I'd like to fill in those missing or undefined values by linearly interpolating with the closest previous and following documents ordered by date.

Such that:

{ date: ISODate("2021-03-07"), value: 10 }
{ date: ISODate("2021-03-08") }
{ date: ISODate("2021-03-09"), value: 15 }
{ date: ISODate("2021-03-10"), value: null }
{ date: ISODate("2021-03-11") }
{ date: ISODate("2021-03-12"), value: 3 }

becomes:

{ date: ISODate("2021-03-07"), value: 10   }
{ date: ISODate("2021-03-08"), value: 12.5 } <=
{ date: ISODate("2021-03-09"), value: 15   }
{ date: ISODate("2021-03-10"), value: 11   } <=
{ date: ISODate("2021-03-11"), value: 7    } <=
{ date: ISODate("2021-03-12"), value: 3    }

Solution

  • Starting in Mongo 5.3, it's a nice use case for the new $fill aggregation operator:

    // { date: ISODate("2021-03-07"), value: 10 }
    // { date: ISODate("2021-03-08") }
    // { date: ISODate("2021-03-09"), value: 15 }
    // { date: ISODate("2021-03-10"), value: null }
    // { date: ISODate("2021-03-11") }
    // { date: ISODate("2021-03-12"), value: 3 }
    db.aggregate(
      { $fill: {
          sortBy: { date: 1 },
          output: { value: { method: "linear" } }
      }}
    )
    // { date: ISODate("2021-03-07"), value: 10   }
    // { date: ISODate("2021-03-08"), value: 12.5 } <=
    // { date: ISODate("2021-03-09"), value: 15   }
    // { date: ISODate("2021-03-10"), value: 11   } <=
    // { date: ISODate("2021-03-11"), value: 7    } <=
    // { date: ISODate("2021-03-12"), value: 3    }
    

    Documents are chronologically ordered by date (sortBy: { date: 1 }), such that missing values are populated using a linear interpolation (value: { method: "linear" }) between previous and following values.


    Note that this really is a linear interpolation based on the chosen sortBy field, such that if you had a missing date, you'd correctly get:

    // { date: ISODate("2021-03-07"), value: 10 }
    // { date: ISODate("2021-03-08") }
    // { date: ISODate("2021-03-09"), value: 15 }
    // { date: ISODate("2021-03-10"), value: null }
    // { date: ISODate("2021-03-11") }
    // => date gap <=
    // { date: ISODate("2021-03-13"), value: 3 }
    db.aggregate(
      { $fill: {
          sortBy: { date: 1 },
          output: { value: { method: "linear" } }
      }}
    )
    // { date: ISODate("2021-03-07"), value: 10   }
    // { date: ISODate("2021-03-08"), value: 12.5 } <=
    // { date: ISODate("2021-03-09"), value: 15   }
    // { date: ISODate("2021-03-10"), value: 12   } <=
    // { date: ISODate("2021-03-11"), value: 9    } <=
    // => date gap <=
    // { date: ISODate("2021-03-12"), value: 3    }
    

    Also note that if you have gaps as mentioned just above and and also want to insert documents in place of those gaps, you can use a $densify stage to get the ultimate missing data filler!:

    // { date: ISODate("2021-03-07"), value: 10 }
    // { date: ISODate("2021-03-08") }
    // { date: ISODate("2021-03-09"), value: 15 }
    // { date: ISODate("2021-03-10"), value: null }
    // { date: ISODate("2021-03-11") }
    // => date gap <=
    // { date: ISODate("2021-03-13"), value: 3 }
    db.aggregate(
      { $densify: {
          field: "date",
          range: { step: 1, unit: "day", bounds: "full" }
      }},
      { $fill: {
          sortBy: { date: 1 },
          output: { value: { method: "linear" } }
      }}
    )
    // { date: ISODate("2021-03-07"), value: 10   }
    // { date: ISODate("2021-03-08"), value: 12.5 } <=
    // { date: ISODate("2021-03-09"), value: 15   }
    // { date: ISODate("2021-03-10"), value: 12   } <=
    // { date: ISODate("2021-03-11"), value: 9    } <=
    // { date: ISODate("2021-03-12"), value: 6    } <= <=
    // { date: ISODate("2021-03-12"), value: 3    }