This last days I'm trying to train a contextual bandit algorithm throw Vowpalwabbit, so I'm doing some toy-model that can help me understand how the algorithm works.
So I imagined a state with 4 possible action and I train my model on two different context. Each context has only one optimal action among the 4 actions.
That's how I did it.
vw = pyvw.vw("--cb_explore 4 -q UA --epsilon 0.1")
vw.learn('1:-2:0.5 | 5')
vw.learn('3:2:0.5 | 5')
vw.learn('1:2:0.5 | 15')
vw.learn('3:-2:0.5 | 15')
vw.learn('4:2:0.5 | 5')
vw.learn('4:2:0.5 | 15')
vw.learn('2:2:0.5 | 5')
vw.learn('2:2:0.5 | 15')
So for my example for the context with his feature equal to 5 the optimal action is 2 and for the other one the optimal action is 3.
When I predict on those two context, there is no problem since the algorithm meet them already once and had get a reward conditioning his choice.
But when I arrive with a new context I expect the algorithm to make me the most relevant action, for example by taking into account the similarity of the context features.
So for example if I give a feature that equal to 29, I'm expecting to get action 3, since 29 is more near to 15 than 5.
So that my interrogations right now.
Thanks !
The problem is in the way you've structured the feature. The input format for a feature is defined as name[:value]
, and if value is not supplied the default value is 1.0. So what you've supplied is a feature whose name is 5
, or 15
. Feature names are hashed and used to determine the index of the feature. So in your case feature 5
and feature 15
both have a value of 1.0 and are distinct features with different indices.
Therefore, to fix your problem you just need to give your features a name.
vw.learn('1:-2:0.5 | my_feature_name:5')
vw.learn('1:2:0.5 | my_feature_name:15')
You can read more about the input format here.
Also, I'd like to point out that -q UA
is not doing anything in your example, as you do not have namespaces. Namespaces can be specified by placing them next to the bar. The following example has two namespaces, A and B. (Note: if more than one character is used for namespace only the first character is used with -q
)
1:-2:0.5 |A my_feature_name:5 |B yet_another_feature:4
In this case if we supplied -q AB
, then VW would create a new feature for each pair of features in A and B at runtime. This allows you to express more complicated interactions in the representation VW learns.