why do pooler use tanh as a activation func in bert, rather than gelu?

class BERTPooler(nn.Module): def init(self, config): super(BERTPooler, self).init() self.dense = nn.Linear(config.hidden_size, config.hidden_size) self.activation = nn.Tanh()

def forward(self, hidden_states):
    # We "pool" the model by simply taking the hidden state corresponding
    # to the first token.
    first_token_tensor = hidden_states[:, 0]
    pooled_output = self.dense(first_token_tensor)
    pooled_output = self.activation(pooled_output)
    return pooled_output

Solution

The author of the original BERT paper answered it (kind of) in a comment on GitHub.

The tanh() thing was done early to try to make it more interpretable but it probably doesn't matter either way.

I agree it doesn't fully answer "whether" tanh is preferable, but from the looks of it, it'll probably work with any activation.

I want to install the "n" package and I get an error
n <version> command does not activate specified version
Change n install location
How to install a specific version of Node on Ubuntu/Debian?
Different node version for different projects, is there a way of telling node which version to use?
Install Node.js to install n to install Node.js?
How to select the latest node.js v6 version using n?
n-install: ERROR: GNU Make not found, which is required for operation
How to downgrade Node version with n
how switch to previous version in n (Node version manager)?
Automatically use the right version of Node for a package
internal/modules/cjs/loader.js:905 -> throw err;
Why doesn't "n" downgrade my node version on a Mac?
Node version manager
n failed to install/switch node in Linux?
vue command not found on Mac
How to uninstall n and all node versions installed by n
Angular CLI on HTTPS - can't install CI as root
n (node version manager): cannot create directory
npm module n emits errors
How to update npm permanently?
Cannot change nodejs version using n
upgrade nodejs to stable version
How should I install and use multiple versions of Node on the same production machine?