I've got a content-based recommender that works... fine. I was fairly certain it was the right approach to take for this problem (matching established "users" with "items" that are virtually always new, but contain known features similar to existing items).
As I was researching, I found that virtually all examples of content-based filtering use articles/movies as an example and look exclusively at using encoded tf-idf features from blocks of text. That wasn't exactly what I was dealing with, but most of my features were boolean features, so making a similar vector and looking at cosine distance was not particularly difficult. I also had one continuous feature, which I scaled and included in the vector. As I said, it seemed to work, but was pretty iffy, and I think I know part of the reason why...
The continuous feature that I'm using is a rating (let's call this "deliciousness"), where, in virtually all cases, a better score would indicate an item more favorable for the user. It's continuous, but it also has a clear "direction" (not sure if this is the correct terminology). Error in one direction is not the same as error in another.
I have cases where some users have given high ratings to items with mediocre "deliciousness" scores, but logically they would still prefer something that was more delicious. That user's vector might have an average deliciousness of 2.3. My understanding of cosine distance is that in my model, if that user encountered two new items that were exactly the same except that one had a deliciousness of 1.0 and the other had a deliciousness of 4.5, it would actually favor the former because it's a shorter distance between vectors.
How do I modify or incorporate some other kind of distance measure here that takes into account that deliciousness error/distance in one direction is not the same as error/distance in the other direction?
(As a secondary question, how do I decide how to best scale this continuous feature next to my boolean features?)
There are two basic approaches to solve this:
(1) Write your own distance function. The obvious approach is to remove the deliciousness
element from each vector, evaluating that difference independently. Use cosine similarity on the rest of the vector. Combine that figure with the taste differential as desired.
(2) Transform your deliciousness
data such that the resulting metric is linear. This will allow a "normal" distance metric to do its job as expected.