PyTorch: How to get around the RuntimeError: in-place operations can be only used on variables that don't share storage with any other variables

With PyTorch I'm having a problem doing an operation with two Variables:

sub_patch  : [torch.FloatTensor of size 9x9x32]

pred_patch : [torch.FloatTensor of size 5x5x32]

sub_patch is a Variable made by torch.zeros pred_patch is a Variable of which I index each of the 25 nodes with a nested for-loop, and that I multiply with its corresponding unique filter (sub_filt_patch) of size [5,5,32]. The result is added to its respective place in sub_patch.

This is a piece of my code:

for i in range(filter_sz):
    for j in range(filter_sz):

        # index correct filter from filter tensor
        sub_filt_col = (patch_col + j) * filter_sz
        sub_filt_row = (patch_row + i) * filter_sz

        sub_filt_patch = sub_filt[sub_filt_row:(sub_filt_row + filter_sz), sub_filt_col:(sub_filt_col+filter_sz), :]

        # multiply filter and pred_patch and sum onto sub patch
        sub_patch[i:(i + filter_sz), j:(j + filter_sz), :] += (sub_filt_patch * pred_patch[i,j]).sum(dim=3)

The error I get from the bottom line of the piece of code here is

RuntimeError: in-place operations can be only used on variables that don't share storage with any other variables, but detected that there are 2 objects sharing it

I get why it happens, since sub_patch is a Variable, and pred_patch is a Variable too, but how can I get around this error? Any help would be greatly appreciated!

Thank you!

Solution

I've found the problem to be in

        sub_patch[i:(i + filter_sz), j:(j + filter_sz), :] += (sub_filt_patch * pred_patch[i,j]).sum(dim=3)

When separating this line into this:

sub_patch[i:(i + filter_sz), j:(j + filter_sz), :] = sub_patch[i:(i + filter_sz), j:(j + filter_sz), :] + (sub_filt_patch * pred_patch[i,j]).sum(dim=3)

Then it worked!

The difference between a += b and a = a + b is that in the first case, b is added in a inplace (so the content of a is changed to now contain a+b). In the second case, a brand new tensor is created that contains a+b and then you assign this new tensor to the name a. To be able to compute gradients, you sometimes need to keep the original value of a, and so we prevent inplace operation from being done because otherwise, we won't be able to compute gradients.