Search code examples
bert-language-modelhuggingface-transformers

Why should I call a BERT module instance rather than the forward method?


I'm trying to extract vector-representations of text using BERT in the transformers libray, and have stumbled on the following part of the documentation for the "BERTModel" class:

enter image description here

Can anybody explain this in more detail? A forward-pass makes intuitive sense to me (am trying to get final hidden states after all), and I can't find any additional information on what "pre and post processing" means in this context.

Thanks up front!


Solution

  • I think this is just general advice concerning working with PyTorch Module's. The transformers modules are nn.Modules, and they require a forward method. However, one should not call model.forward() manually but instead call model(). The reason is that PyTorch does some stuff under the hood when just calling the Module. You can find that in the source code.

    def __call__(self, *input, **kwargs):
        for hook in self._forward_pre_hooks.values():
            result = hook(self, input)
            if result is not None:
                if not isinstance(result, tuple):
                    result = (result,)
                input = result
        if torch._C._get_tracing_state():
            result = self._slow_forward(*input, **kwargs)
        else:
            result = self.forward(*input, **kwargs)
        for hook in self._forward_hooks.values():
            hook_result = hook(self, input, result)
            if hook_result is not None:
                result = hook_result
        if len(self._backward_hooks) > 0:
            var = result
            while not isinstance(var, torch.Tensor):
                if isinstance(var, dict):
                    var = next((v for v in var.values() if isinstance(v, torch.Tensor)))
                else:
                    var = var[0]
            grad_fn = var.grad_fn
            if grad_fn is not None:
                for hook in self._backward_hooks.values():
                    wrapper = functools.partial(hook, self)
                    functools.update_wrapper(wrapper, hook)
                    grad_fn.register_hook(wrapper)
        return result
    

    You'll see that forward is called when necessary.