How to set requires_grad_ to false (freeze?) PyTorch Lazy layers?

PyTorch offers lazy layers e.g. torch.nn.LazyLinear. I want to freeze some of these layers in my network i.e. ensure they take no gradient steps. When I try .requires_grad_(False), I receive the error:

ValueError: Attempted to use an uninitialized parameter in <method 'requires_grad_' of 'torch._C._TensorBase' objects>. This error happens when you are using a LazyModuleor explicitly manipulatingtorch.nn.parameter.UninitializedParameterobjects. When using LazyModules Callforward with a dummy batch to initialize the parameters before calling torch functions

How do I freeze lazy layers?

Solution

As the error message suggests you need to perform a dummy inference on a lazy module in order to fully initialize all of its parameters.

In other words, considering your correct input shape shape:

>>> model(torch.rand(shape))

And only then can you freeze the desired layer with requires_grad_(False).

Google Colab: "Unable to connect to the runtime" after uploading Pytorch model from local
Why am I getting "RuntimeError: Trying to backward through the graph a second time"?
Why is a namespace-qualified name available as a candidate?
Detectron2 - Extract region features at a threshold for object detection
Why does pytorch tensor.item() give an unprecise output to any real number input but give a precise output to a number that ends with .0 or .5?
Pytorch Geometric graph batching not using DataLoader for Reinforcement learning
How do I rotate a PyTorch image tensor around it's center in a way that supports autograd?
How to access the weights of a layer in pretrained efficientnet-b3 in torch?
Difference between Parameter vs. Tensor in PyTorch
Is it possible to set different activation functions for different outputs at the final layer in the neural net?
Finding rectangles enclosing points
How does multidimensional input to a nn.Linear layer work?
Multi dimensional inputs in pytorch Linear method?
Transformers: Cross Attention Tensor Shapes During Inference Mode
Poetry PyTorch dependency exclude cuda as I want to use the system cuda
Where do I get a CPU-only version of PyTorch?
What does `gather()` do in PyTorch in layman terms?
Implementation of torch.nn.Conv1d in C++
Human segmentation fails with Pytorch, not with Tensorflow Keras
Download location of Pytorch
Query padding mask and key padding mask in Transformer encoder
Masking and computing loss for a padded batch sent through an RNN with a linear output layer in pytorch
How to actually apply a Conv2d filter in Pytorch
Applying convolution operation to image - PyTorch
Algorithim of how Conv2d is implemented in PyTorch
Where is the source code of pytorch conv2d?
Obtaining the image iterations before final image has been generated StableDiffusionPipeline.pretrained
Install pytorch version 1.0.0 using pip or cuda in 2023
Why does nn.Linear(in_features, out_features) use a weight matrix of shape (out_features, in_features) in PyTorch?
How can I convert images to 1-bit tensor and use them for To reduce RAM and GPU usage and training in PyTorch?