Search code examples
tensorflownlphuggingface-transformersbert-language-modelword-embedding

What does the embedding elements stand for in huggingFace bert model?


Prior to passing my tokens through encoder in BERT model, I would like to perform some processing on their embeddings. I extracted the embedding weight using:

from transformers import TFBertModel

# Load a pre-trained BERT model
model = TFBertModel.from_pretrained('bert-base-uncased')

# Get the embedding layer of the model
embedding_layer = model.get_layer('bert').get_input_embeddings()

# Extract the embedding weights
embedding_weights = embedding_layer.get_weights()

I found it contains 5 elements as shown in Figure. enter image description here

In my understanding, the first three elements are the word embedding weights, token type embedding weights, and positional embedding weights. My question is what does the last two elements stand for?

I dive deep into the source code of bert model. But I cannot figure out the meaning of the last two elements.


Solution

  • In bert model, there is a post-processing of the embedding tensor that uses layer normalization followed by dropout , https://github.com/google-research/bert/blob/eedf5716ce1268e56f0a50264a88cafad334ac61/modeling.py#L362

    I think that those two arrays are the gamma and beta of the normalization layer, https://www.tensorflow.org/api_docs/python/tf/keras/layers/LayerNormalization They are learned parameters, and will span the axes of inputs specified in param "axis" which defaults to -1 (corresponding to 768 in embedding tensor).