I'm working on a program in which I initialise leaf node tensors (which require gradients) with the same values as existing tensors. As a simple example of such a program, I create 4 tensors as follows:
a = torch.tensor(4, dtype=torch.float32)
b = a.detach().clone().requires_grad_(True)
c = b*b
d = torch.empty_like(c).copy_(c).requires_grad_(True)
print(b.is_leaf) # True
print(d.is_leaf) # False
d
is not a leaf node. If I define b
as b = a.detach().clone()
(IE don't call requires_grad_(True)
for b
), then both b
and d
are leaf nodes:
a = torch.tensor(4, dtype=torch.float32)
b = a.detach().clone()
c = b*b
d = torch.empty_like(c).copy_(c).requires_grad_(True)
print(b.is_leaf) # True
print(d.is_leaf) # True
Why does the way in which b
is defined determine whether or not d
is a leaf node?
Note that if I define d
as d = c.detach().clone().requires_grad_(True)
(using detach.clone
instead of empty_like.copy_
), then d
is a leaf node regardless of whether or not I call requires_grad_(True)
for b
.
A similar question: How to understand creating leaf tensors in PyTorch?
You could check that out, I think it should be understandable from that.
Also useful documentations:
In your first example, you create a tensor:
a = torch.tensor(4, dtype=torch.float32)
which is by convention, a leaf. Your b
is still a leaf, that you created with requires_grad_(True)
. Then you, perform an operation on this tensor (that has requires_grad=True
!!!) by b * b
, making c
not a leaf tensor anymore.
However in your second example, you do not perform any operations on tensors that have requires_grad=True
, so your operations are not recorded.
All of these tensors - by convention - are leaf tensors. Last line of your second example, just makes a copy of your c
tensor (still no operations recorded), and returns a NEW tensor that has requires_grad=True
. But in this example, this is the first tensor such that it requires gradients, with no operations performed on it, still making it a leaf node.