In the code here: https://www.kaggle.com/ryanholbrook/detecting-the-higgs-boson-with-tpus
Before the model is compiled, the model is made using this code:
with strategy.scope():
# Wide Network
wide = keras.experimental.LinearModel()
# Deep Network
inputs = keras.Input(shape=[28])
x = dense_block(UNITS, ACTIVATION, DROPOUT)(inputs)
x = dense_block(UNITS, ACTIVATION, DROPOUT)(x)
x = dense_block(UNITS, ACTIVATION, DROPOUT)(x)
x = dense_block(UNITS, ACTIVATION, DROPOUT)(x)
x = dense_block(UNITS, ACTIVATION, DROPOUT)(x)
outputs = layers.Dense(1)(x)
deep = keras.Model(inputs=inputs, outputs=outputs)
# Wide and Deep Network
wide_and_deep = keras.experimental.WideDeepModel(
linear_model=wide,
dnn_model=deep,
activation='sigmoid',
)
I don't understand what with strategy.scope()
does here and if it in any way affects the model. What does it do exactly?
In the future how could I figure out what this does? What resources would I have to look into to figure this out?
Distribution strategies were introduced as part of TF2 to help distribute training across multiple GPUs, multiple machines or TPUs with minimal code changes. I'd recommend this guide to distributed training for starters.
Specifically creating a model under the TPUStrategy
will place the model in a replicated (same weights on each of the cores) manner on the TPU and will keep the replica weights in sync by adding appropriate collective communications (all reducing the gradients). For more information check the API doc on TPUStrategy as well as this intro to TPUs in TF2 colab notebook.