Search code examples
tensorflowmachine-learningkubernetesdistributed-computinggoogle-cloud-ml

Simplest way to distribute Tensorflow training on premise?


What is the simplest way to train tensorflow models (using Estimator API) distributed across a home network? Doesn't look like ml-engine local train allows you to specify IPs.


Solution

  • Your best bet is to use something like Kubernetes. This is a work in progress, but I believe it does have support for distributed training as well -- https://github.com/tensorflow/k8s.

    Alternatively for more low-tech automation options, these come to mind...

    1. You could have a script which still uses SSH and executes a script remotely.
    2. You could have the individual workers poll a shared location for a file to use as a signal to download and execute a script.