Search code examples
llama

Why there is a "rope.freqs" variable in llama-2-7b weights?


I noticed a weight called “rope.freqs” in the weights of llama2 model(e.g. llama-2-7b or llama-2-7b-chat). What is the function of this weight, and which part of the model does it correspond to?

IN [14]: checkpoint = torch.load('llama-2-7b-chat/consolidated.00.pth',map_location='cpu')
In [15]: checkpoint['rope.freqs']
Out[15]:
tensor([1.0000e+00, 8.6719e-01, 7.5000e-01, 6.4844e-01, 5.6250e-01, 4.8633e-01,
        4.2188e-01, 3.6523e-01, 3.1641e-01, 2.7344e-01, 2.3730e-01, 2.0508e-01,
        1.7773e-01, 1.5430e-01, 1.3379e-01, 1.1523e-01, 1.0010e-01, 8.6426e-02,
        7.5195e-02, 6.4941e-02, 5.6152e-02, 4.8584e-02, 4.2236e-02, 3.6621e-02,
        3.1738e-02, 2.7344e-02, 2.3682e-02, 2.0508e-02, 1.7822e-02, 1.5381e-02,
        1.3306e-02, 1.1536e-02, 1.0010e-02, 8.6670e-03, 7.5073e-03, 6.5002e-03,
        5.6152e-03, 4.8828e-03, 4.2114e-03, 3.6469e-03, 3.1586e-03, 2.7313e-03,
        2.3651e-03, 2.0599e-03, 1.7776e-03, 1.5411e-03, 1.3351e-03, 1.1520e-03,
        9.9945e-04, 8.6594e-04, 7.5150e-04, 6.4850e-04, 5.6076e-04, 4.8637e-04,
        4.2152e-04, 3.6430e-04, 3.1662e-04, 2.7466e-04, 2.3746e-04, 2.0504e-04,
        1.7738e-04, 1.5354e-04, 1.3351e-04, 1.1539e-04], device='cpu',
       dtype=torch.bfloat16)

In [16]: checkpoint['rope.freqs'].shape
Out[16]: torch.Size([64])


Solution

  • It is the parameter in RoPE, a kind of position embedding. Check the paper "RoFormer: Enhanced Transformer with Rotary Position Embedding" if you want. In the code of LLaMA, it uses precomputed cosine and sine frequencies instead.