Search code examples
mongodbmongodb-queryaggregation-frameworkretentionredash

First Week Retention with MongoDB


One table with session start events:

start data sheet

Registered - unixtime - when the user was registered (first time opened the app)

DateTime - unixtime - time when the event was sent (when player installs the app and opens it for the first time: Registered = DateTime)

PlayerId - unique ID for player (same ID - always same Registered, but same registered - may be more than one PlayerId for it)

I need to get the table like this:

Nice-looking Retention sheet

Made this query (MongoDB for redash) so far:

{
    "collection": "dance",
    "aggregate": [
        {
            "$match": {
                "$and": [
                    {
//---filter for the range of Day0 dates
//---need to build Ret_Day1 - Ret_Day7 for each
                        "Registered": {
                            "$lt": "ISODate(\"{{Finish date}}\")"
                        }
                    },
                    {
                        "Registered": {
                            "$gt": "ISODate(\"{{Start date}}\")"
                        }
                    },
                    {
                        "EventType": "Session Start"
                    }
                ]
            }
        },
        {
            "$group": {
                "_id": {
                    "DayZero": {
                        "$dateToString": {
                            "format": "%Y-%m-%d",
                            "date": "$Registered"
                        }
                    },
                    "DayActive": {
                        "$dateToString": {
                            "format": "%Y-%m-%d",
                            "date": "$DateTime"
                        }
                    },
                    "PlayerId": "$PlayerId"
                }
            }
        },
        {
            "$group": {
                "_id": {
                    "DayZero": "$_id.DayZero",
                    "DayActive": "$_id.DayActive"
                },
                "Ret": {
                    "$sum": 1
                }
            }
        },
        {
            "$project": {
                "_id": "1",
                "DayZero": {
                    "$dateFromString": {
                        "dateString": "$_id.DayZero"
                    }
                },
                "DayActive": {
                    "$dateFromString": {
                        "dateString": "$_id.DayActive"
                    }
                },
                "Ret": 1
            }
        },
        {
            "$project": {
                "Days": {
                    "$divide": [
                        {
                            "$subtract": [
                                "$DayActive",
                                "$DayZero"
                            ]
                        },
                        86400000
                    ]
                },
                "DayZero": {
                    "$dateToParts": {
                        "date": "$DayZero",
                        "timezone": "+02:00"
                    }
                },
                "Ret": 1
            }
        },
        {
            "$project": {
                "Ret": 1,
                "Days": 1,
                "DayZero": {
                    "$concat": [
                        {
                            "$toString": "$DayZero.day"
                        },
                        ".",
                        {
                            "$toString": "$DayZero.month"
                        },
                        ".",
                        {
                            "$toString": "$DayZero.year"
                        }
                    ]
                }
            }
        }
    ]
}

Result:

raw data

In redash I can build such visualization:

pivot table

But it's not good enough - so I want to fill the new fields (R_Day1 - R_Day7) with the number of users and percent of users. Which method will be the easiest for calculating the percents for each day?


Solution

  • This is a little tricky as it requires date math combined with data structure manipulation.

    Here's a quick aggregation that will give you satisfying output you can work with, The stucture returned will be in this form:

    {
        "day": string
        "newUsers" : number,
        "DateTimes" : Array<{day: number, month: number, year: number, users: number, percentage: number}> 
    }
    

    With the DateTime array already sorted, so for R_1 you would use DateTime[0].percentage for the percentage of users for that day.

    db.collection.aggregate([
        {
            $group: {
                _id: {
                    day: {$dayOfMonth: "$Registered"},
                    month: {$month: "$Registered"},
                    year: {$year: "$Registered"},
                    dateTimeDay: {$dayOfMonth: "$DateTime"},
                    dateTimeMonth: {$month: "$DateTime"},
                    dateTimeYear: {$year: "$DateTime"}
                },
                users: {$addToSet: "$PlayerId"}
            }
        },
        {
            $group: {
                _id: {day: "$_id.day", month: "$_id.month", year: "$_id.year"},
                totalUsers: {$addToSet: "$users"},
                DateTimes: {
                    $push: {
                        day: "$_id.dateTimeDay",
                        month: "$_id.dateTimeMonth",
                        year: "$_id.dateTimeYear",
                        users: {$size: "$users"}
                    }
                }
            }
        },
        {
            $addFields: {
                R1_to_R7: {
                    $map: {
                        input: [1, 2, 3, 4, 5, 6, 7],
                        as: "input",
                        in: {
                            $cond: [
                                {$setIsSubset: [["$_id.month"], [1, 3, 5, 7, 8, 10, 12]]},
                                {
                                    $cond: [
                                        {
                                            $and: [
                                                {$lt: [{$mod: [{$sum: ["$_id.day", "$$input"]}, 32]}, 7]},
                                                {$gt: ["$_id.day", 20]}
                                            ]
                                        },
                                        {
                                            day: {$sum: [{$mod: [{$sum: ["$_id.day", "$$input"]}, 32]}, 1]},
                                            month: {
                                                $cond: [
                                                    {$eq: [{$mod: [{$sum: ["$_id.month", 1]}, 13]}, 0]},
                                                    1,
                                                    {$sum: ["$_id.month", 1]},
                                                ]
                                            },
                                            year: {
                                                $cond: [
                                                    {$eq: [{$mod: [{$sum: ["$_id.month", 1]}, 13]}, 0]},
                                                    {$sum: ["$_id.year", 1]},
                                                    "$_id.year"
                                                ]
                                            }
                                        },
                                        {
                                            day: {$mod: [{$sum: ["$_id.day", "$$input"]}, 32]},
                                            month: "$_id.month",
                                            year: "$_id.year"
                                        }
                                    ]
                                },
                                {
                                    $cond: [
                                        {
                                            $eq: ["$_id.month", 2]
                                        },
                                        {
                                            $cond: [
                                                {
                                                    $and: [
                                                        {$lt: [{$mod: [{$sum: ["$_id.day", "$$input"]}, 29]}, 7]},
                                                        {$gt: ["$_id.day", 20]}
                                                    ]
                                                },
                                                {
                                                    day: {$sum: [{$mod: [{$sum: ["$_id.day", "$$input"]}, 29]}, 1]},
                                                    month: 3,
                                                    year: "$_id.year",
                                                },
                                                {
                                                    day: {$mod: [{$sum: ["$_id.day", "$$input"]}, 29]},
                                                    month: "$_id.month",
                                                    year: "$_id.year"
                                                }
                                            ]
                                        },
                                        {
                                            $cond: [
                                                {
                                                    $and: [
                                                        {$lt: [{$mod: [{$sum: ["$_id.day", "$$input"]}, 31]}, 7]},
                                                        {$gt: ["$_id.day", 20]}
                                                    ]
                                                },
                                                {
                                                    day: {$sum: [{$mod: [{$sum: ["$_id.day", "$$input"]}, 31]}, 1]},
                                                    month: {$sum: ["$_id.month", 1]},
                                                    year: "$_id.year",
                                                },
                                                {
                                                    day: {$mod: [{$sum: ["$_id.day", "$$input"]}, 31]},
                                                    month: "$_id,month",
                                                    year: "$_id.year"
                                                }
                                            ]
                                        }
                                    ]
                                }
                            ]
                        }
                    }
                }
            }
        },
        {
            $project: {
                _id: 0,
                 day: {$concat: [{$toString: "$_id.day"}, "/", {$toString: "$_id.month"}, "/", {$toString: "$_id.year"}]},
                newUsers: {
                    $size: {
                        $reduce: {
                            input: "$totalUsers",
                            initialValue: [],
                            in: {$setUnion: ["$$value", "$$this"]}
                        }
                    }
                },
                DateTimes: {
                    $map: {
                        input: "$R1_to_R7",
                        as: "next_day",
                        in: {
                            $cond: [
                                {
                                    $gt: [
                                        {
                                            $size: {
                                                $filter: {
                                                    input: "$DateTimes",
                                                    as: "dateTime",
                                                    cond: {
                                                        $eq: [{
                                                            day: "$$dateTime.day",
                                                            month: "$$dateTime.month",
                                                            year: "$$dateTime.year"
                                                        }, "$$next_day"]
                                                    }
                                                }
                                            }
                                        },
                                        0
                                    ]
                                },
                                {
                                    $arrayElemAt: [
                                        {
    
                                            $filter: {
                                                input: "$DateTimes",
                                                as: "dateTime",
                                                cond: {
                                                    $eq: [{
                                                        day: "$$dateTime.day",
                                                        month: "$$dateTime.month",
                                                        year: "$$dateTime.year"
                                                    }, "$$next_day"]
                                                }
                                            }
                                        },
                                        0
                                    ]
                                },
                                {
                                    $mergeObjects: ["$$next_day", {users: 0}]
                                }
                            ]
                        }
                    }
                }
            }
        },
        {
            $project: {
                day: 1,
                newUsers: 1,
                DateTimes: {
                    $map: {
                        input: "$DateTimes",
                        as: "datetime",
                        in: {
                            $mergeObjects: [
                                "$$datetime",
                                {percentage: {$multiply: [100, {$divide: ["$$datetime.users", "$newUsers"]}]}}
                            ]
                        }
                    }
                }
            }
        }
    ])
    

    With that out of the way 2 more things to consider:

    1. The first $addFields stage is used to "add" the next 7 days. this is quite hard for Mongo to do in this context. if you can do it in code it would be better as a lot of redundancies exist.
    2. As you can see I had to change the day, month and year value for the next 7 days in case the registration was on let's say the 31st. BUT I did not account gap years. if you want to keep it this way and for it to be stable you should add a gap year check ($mod should work fine as it's every 4 years.) to the condition with the 2 month (February).