Using a precompiled version of Vowpal Wabbit - Downsides?

Due to the difficulty of compiling VW on a RHEL machine, I am opting out to use a compiled versions of VW provided by Ariel Faigon (thank you!) here. I'm calling VW from Python, so I am planning on using Python's subprocess module (I couldn't get the python package to compile either). I am wondering if there would be any downsides to this approach. Would I see any performance lags?

Thank you so much for your help!

Solution

Feeding a live vowpal wabbit process via Python's subprocess is ok (fast). as long as you don't start a new process per example and avoid excessive context switches. In my experience, in this set up, you can expect a throughput of ~500k features per second on typical dual-core hardware. This is not as fast as the (10x faster) ~5M features/sec vw typically processes when not interacting with any other software (reading from file/cache), but is good enough for most practical purposes. Note that the bottleneck in this setting would most likely be the processing by the additional process, not vowpal-wabbit itself.

It is recommended to feed vowpal-wabbit in batches (N examples at a time, instead of one at a time) both on input (feeding vw) and on output (reading vw responses). If you're using subprocess.Popen to connect to the process, make sure to pass a large bufsize otherwise by default the Popen iterator would be line-buffered (one example at a time) which might result in a per example context-switch between the producer of examples and consumer (vowpal wabbit).

Assuming your vw command line is in vw_cmd, it would be something like:

vw_proc = subprocess.Popen(vw_cmd,
                   stdout=subprocess.PIPE, stderr=subprocess.STDOUT,
                   bufsize=1048576)

Generally, slowness can come from:

Too many context switches (generating and processing one example at a time)
Too much processing outside vw (e.g. generating the examples in the first place, feature transformation)
Startup overhead (e.g. reading the model) per example.

So avoiding all the above pitfalls should give you the fastest throughput possible under the circumstances of having to interact with additional processes.