Search code examples
vowpalwabbit

Possible to output Vowpal Wabbit predictions to .txt along with observed target values?


We're writing a forecasting application that uses Vowpal Wabbit and are looking to automate as much of our model validation process as we can. Anyone know whether vw has a native utility to output the target values in a test file along with the predictions from a vw model? These values are printed to the terminal output during prediction. Is there an argument to the regular vw call, or perhaps a tool in the utl folder that prints targets and forecasts together on a row-wise basis?

Here's what the code I'm using now for prediction looks like:

vw -d /path/to/data/test.vw -t -i lg.vw --link=logistic -p predictions.txt

My goal is to produce from within Vowpal an output file that looks like this:

Predicted  Target
0.78       1
0.23       0 
0.49       1

...

UPDATE

@arielf's code worked like a charm. I've only made one minor addition to print the streaming results to a validation.txt file:

vw -d test.vw -t -i lg.vw --link=logistic -P 1 2>&1 | \
     perl -ane 'print "$F[5]\t$F[4]\n" if (/^\d/)' > validation.txt

Solution

  • Try this:

    vw -d test.vw -t -i lg.vw --link=logistic -P 1 2>&1 | \
        perl -ane 'print "$F[5]\t$F[4]\n" if (/^\d/)'
    

    Explanation:

    -P 1     # Add option: set vw progress report to apply to every example
    

    Note: -P is a capital P (alias for --progress), 1 is the progress printing interval.

    Note that you don't need to add predictions with -p ... since that is redundant in this case (predictions are already included in vw progress lines)

    A progress report line with headers, looks like this:

    average   since     example    example   current  current   current
    loss      last      counter     weight     label  predict  features
    0.000494  0.000494        1        1.0   -0.0222   0.0000        14
    

    Since progress report goes to stderr, we need to redirect stderr to stdout (2>&1).

    Now we pipe the vw progress output into perl for simple post-processing. The perl command loops over each line of input without printing by default (-n), auto-splits into fields on white-space (-a), and applies the expression (-e) printing the 5th and 4th fields separated by a TAB and terminated by a newline if the line starts with a number (in order to skip whatever isn't a progress line, e.g. headers, preambles and summary lines). I reversed the 5th & 4th filed order because vw progress lines have the observed value before the predicted value and you asked for the opposite order.

    UPDATE

    Aaron published a working example using this solution in Google Drive: https://drive.google.com/open?id=0BzKSYsAMaJLjZzJlWFA2N3NnZGc