I have a collection of documents representing values at specific dates.
Some of those dates don't have a value (the field can be missing or set to null
).
I'd like to fill in those missing or undefined values by linearly interpolating with the closest previous and following documents ordered by date.
Such that:
{ date: ISODate("2021-03-07"), value: 10 }
{ date: ISODate("2021-03-08") }
{ date: ISODate("2021-03-09"), value: 15 }
{ date: ISODate("2021-03-10"), value: null }
{ date: ISODate("2021-03-11") }
{ date: ISODate("2021-03-12"), value: 3 }
becomes:
{ date: ISODate("2021-03-07"), value: 10 }
{ date: ISODate("2021-03-08"), value: 12.5 } <=
{ date: ISODate("2021-03-09"), value: 15 }
{ date: ISODate("2021-03-10"), value: 11 } <=
{ date: ISODate("2021-03-11"), value: 7 } <=
{ date: ISODate("2021-03-12"), value: 3 }
Starting in Mongo 5.3
, it's a nice use case for the new $fill
aggregation operator:
// { date: ISODate("2021-03-07"), value: 10 }
// { date: ISODate("2021-03-08") }
// { date: ISODate("2021-03-09"), value: 15 }
// { date: ISODate("2021-03-10"), value: null }
// { date: ISODate("2021-03-11") }
// { date: ISODate("2021-03-12"), value: 3 }
db.aggregate(
{ $fill: {
sortBy: { date: 1 },
output: { value: { method: "linear" } }
}}
)
// { date: ISODate("2021-03-07"), value: 10 }
// { date: ISODate("2021-03-08"), value: 12.5 } <=
// { date: ISODate("2021-03-09"), value: 15 }
// { date: ISODate("2021-03-10"), value: 11 } <=
// { date: ISODate("2021-03-11"), value: 7 } <=
// { date: ISODate("2021-03-12"), value: 3 }
Documents are chronologically ordered by date
(sortBy: { date: 1 }
), such that missing value
s are populated using a linear
interpolation (value: { method: "linear" }
) between previous and following values.
Note that this really is a linear interpolation based on the chosen sortBy
field, such that if you had a missing date, you'd correctly get:
// { date: ISODate("2021-03-07"), value: 10 }
// { date: ISODate("2021-03-08") }
// { date: ISODate("2021-03-09"), value: 15 }
// { date: ISODate("2021-03-10"), value: null }
// { date: ISODate("2021-03-11") }
// => date gap <=
// { date: ISODate("2021-03-13"), value: 3 }
db.aggregate(
{ $fill: {
sortBy: { date: 1 },
output: { value: { method: "linear" } }
}}
)
// { date: ISODate("2021-03-07"), value: 10 }
// { date: ISODate("2021-03-08"), value: 12.5 } <=
// { date: ISODate("2021-03-09"), value: 15 }
// { date: ISODate("2021-03-10"), value: 12 } <=
// { date: ISODate("2021-03-11"), value: 9 } <=
// => date gap <=
// { date: ISODate("2021-03-12"), value: 3 }
Also note that if you have gaps as mentioned just above and and also want to insert documents in place of those gaps, you can use a $densify
stage to get the ultimate missing data filler!:
// { date: ISODate("2021-03-07"), value: 10 }
// { date: ISODate("2021-03-08") }
// { date: ISODate("2021-03-09"), value: 15 }
// { date: ISODate("2021-03-10"), value: null }
// { date: ISODate("2021-03-11") }
// => date gap <=
// { date: ISODate("2021-03-13"), value: 3 }
db.aggregate(
{ $densify: {
field: "date",
range: { step: 1, unit: "day", bounds: "full" }
}},
{ $fill: {
sortBy: { date: 1 },
output: { value: { method: "linear" } }
}}
)
// { date: ISODate("2021-03-07"), value: 10 }
// { date: ISODate("2021-03-08"), value: 12.5 } <=
// { date: ISODate("2021-03-09"), value: 15 }
// { date: ISODate("2021-03-10"), value: 12 } <=
// { date: ISODate("2021-03-11"), value: 9 } <=
// { date: ISODate("2021-03-12"), value: 6 } <= <=
// { date: ISODate("2021-03-12"), value: 3 }