We're writing a forecasting application that uses Vowpal Wabbit and are looking to automate as much of our model validation process as we can. Anyone know whether vw
has a native utility to output the target values in a test file along with the predictions from a vw
model? These values are printed to the terminal output during prediction. Is there an argument to the regular vw
call, or perhaps a tool in the utl
folder that prints targets and forecasts together on a row-wise basis?
Here's what the code I'm using now for prediction looks like:
vw -d /path/to/data/test.vw -t -i lg.vw --link=logistic -p predictions.txt
My goal is to produce from within Vowpal an output file that looks like this:
Predicted Target
0.78 1
0.23 0
0.49 1
...
UPDATE
@arielf's code worked like a charm. I've only made one minor addition to print the streaming results to a validation.txt
file:
vw -d test.vw -t -i lg.vw --link=logistic -P 1 2>&1 | \
perl -ane 'print "$F[5]\t$F[4]\n" if (/^\d/)' > validation.txt
Try this:
vw -d test.vw -t -i lg.vw --link=logistic -P 1 2>&1 | \
perl -ane 'print "$F[5]\t$F[4]\n" if (/^\d/)'
Explanation:
-P 1 # Add option: set vw progress report to apply to every example
Note: -P
is a capital P
(alias for --progress
), 1
is the progress printing interval.
Note that you don't need to add predictions with -p ...
since that is redundant in this case (predictions are already included in vw
progress lines)
A progress report line with headers, looks like this:
average since example example current current current
loss last counter weight label predict features
0.000494 0.000494 1 1.0 -0.0222 0.0000 14
Since progress report goes to stderr, we need to redirect stderr to stdout (2>&1
).
Now we pipe the vw
progress output into perl
for simple post-processing. The perl
command loops over each line of input without printing by default (-n
), auto-splits into fields on white-space (-a
), and applies the expression (-e
) printing the 5th and 4th fields separated by a TAB and terminated by a newline if the line starts with a number (in order to skip whatever isn't a progress line, e.g. headers, preambles and summary lines). I reversed the 5th & 4th filed order because vw
progress lines have the observed value before the predicted value and you asked for the opposite order.
UPDATE
Aaron published a working example using this solution in Google Drive: https://drive.google.com/open?id=0BzKSYsAMaJLjZzJlWFA2N3NnZGc