I am using hypothesis, specifically the numpy extension, to write tests to upgrade a tensorflow model.
This involves generating a number of tensors that share dimensions, such as batch size. For example, what I would like to do:
batch_size = integers(min_value=1, max_value=512)
hidden_state_size = integers(min_value=1, max_value=10_000)
@given(
arrays(dtype=float32, shape=(batch_size, integers(min_value=1, max_value=10_000)),
arrays(dtype=float32, shape=(batch_size, hidden_state_size)),
arrays(dtype=float32, shape=(batch_size, hidden_state_size, integers(min_value=1, max_value=10_000)),
)
def test_code(input_array, initial_state, encoder_state):
...
but obviously this doesn't work because shape
requires int
s not integers
s.
I could use a @composite
decorated function to generate all the necessary tensors and unpack them within the test but this requires a lot of boiler plate that is difficult to read and slow to develop with.
I've also looked at the shared
strategy but couldn't get that working.
Any suggestions would be appreciated because I think this would be a great tool for hardening NN code.
The trick is to use shared
*and define your shapes with the tuples
strategy: a tuple of strategies is not a valid shape argument, but a strategy for tuples-of-ints is. That looks like:
batch_size = shared(integers(min_value=1, max_value=512))
hidden_state_size = shared(integers(min_value=1, max_value=10_000))
@given(
arrays(dtype=float32, shape=tuples(batch_size, integers(min_value=1, max_value=10_000)),
arrays(dtype=float32, shape=tuples(batch_size, hidden_state_size)),
arrays(dtype=float32, shape=tuples(batch_size, hidden_state_size, integers(min_value=1, max_value=10_000)),
)
def test_code(input_array, initial_state, encoder_state):
...
Separately, I would also suggest reducing the maximum sizes considerably - running (many) more tests on smaller arrays is likely to catch more bugs in the same length of time. But check --hypothesis-show-statistics
and profile before blindly applying performance advice!