Search code examples
amazon-web-servicesamazon-personalize

Are there any benefits to including EVENT_TYPE values in Personalize that will be ignored later on?


Imagine you are building a recommender system for a VOD service. You create an INTERACTIONS dataset that contains two EVENT_TYPE values - "clicked" and "watched". You then set the eventType parameter to "watched" when setting up your recipe.

Are there any benefits in including the "clicked" events or is this basically the same as building an INTERACTIONS dataset without them?


Solution

  • If you're only creating one solution with an eventType of "watched" in the solution config, the presence of "clicked" event types in your interactions dataset are not adding any value. In fact, you will be paying for the ingestion of these events for no reason.

    However, if you intend to create other solutions in the same dataset group that target additional personalization use cases, you may want to train on "clicked" and "watched" events (i.e., do not specify an eventType when creating the solution).

    Take a look at the required event types for VOD domain dataset recommender recipes. You will see that some require/train on "Watch" events and others train on "Watch" and "Click" events. This will give you an idea of how these use case optimized recipes for VOD work. You can either use these recipes or create your own custom solutions where you have full control over dataset schemas and event types. Just note that if you use the VOD recipes, your event types must be "Watch" and "Click" and your schemas will have to conform to the VOD required fields/columns.

    The decision process for which event types to train on often comes down to the events you have available in your data. If you have good coverage of events across items and users with just "watched" events, then you may be in good shape. However, often including lower fidelity/intent event types likes "clicked" are needed as well to produce a performant model. You can get a sense of the difference with your data by creating a solution that trains only on "watched" events and another solution that trains on all events and then compare the offline metrics from the solution versions.