machine-learning neural-network computer-vision conv-neural-network convergence

Different random weight initialization leading to different performances

I'm training a 3D U-Net on an EM dataset of a brain. The objective is to segment neurons in it. During the experiments, I've noticed, different random initialization of the network leads to different performances. I evaluate the performance based on mean Intersection over Union, and I observe differences as large as 5%.

I use xavier initialization with a uniform distribution and use a constant learning rate of 1e-4.

Edit: I'm asking how I can avoid a behavior like this?

Solution

As Roberto said, different random initializations always lead to different resulting weights. This is quite obvious, as the initial state constrains the possible system trajectories.

If you read the paper introducing Xavier init, you will see that it is well known that the random init has a large init influence on the resulting performance (there is a paper which showed that hundreds of training runs with random init all end up in different local minima, but I can't find it right now). This is the very reason why we use heuristics like Xavier init: they tend to lead to better local minima than other forms of random initialization.

Typically, one performs multiple training runs (e.g. 10) and takes the best architecture, in articles, the mean is also reported sometimes.