Query regarding Pytorch LSTM code snippet

In the Stack Overflow thread How can i add a Bi-LSTM layer on top of bert model?, there is a line of code:

hidden = torch.cat((lstm_output[:,-1, :256],lstm_output[:,0, 256:]),dim=-1)

Can someone explain why the concatenation of last and first tokens and not any other? What would these two tokens contain that they were chosen?

Solution

In bidirectional models, hidden states gets concatenated at each step; so, the line basically concatenates the first :256 units of the last hidden state in the positive direction (-1) to the last 256: units of the last hidden state in the negative direction (0). Such locations contain the most "interesting" summary of the input sequence.

I've written a longer and detailed answer on how hidden states are constructed in PyTorch for recurrent modules.

Webscraping Roblox
Remove the mandatory field label 'This field is required.' and fix the bug with 'clean_email'
How to plot a Probability Density Function in Python?
How large is a fresh install of Python?
Appending new elements into an empty list
Simple way to measure cell execution time in ipython notebook
PyAudio working, but spits out error messages each time
Reportlab show page number and page count IF there is more than one page in a document
How to read SharePoint Online (Office365) Excel files into Python specifically pandas with Work or School Account?
How to set a column which suffix name is based on a value in another column
Debugging Python C++ extension from Visual Studio Code on Linux
How can I get all users on Google admin_sdk?
csv.Error: iterator should return strings, not bytes
How to check if an object has an attribute?
How to use selenium with proxy auth in headless mode?
Is there a way to exit a pytest test and continue to the next one?
Returning the lowest index for the first non whitespace character in a string in Python
Formatting exceptions as Python does
Prime factorization using list comprehension in Python
Why does the power spectrum E(k) of my velocity field follow 𝑘 ^(−(n−1)) instead of 𝑘^(−n)?
How to merge dataframes over multiple columns and split rows?
How to create a Sympy IndexedBase using a custom subclass of Symbol?
Removing dynamically an element from a list
Returning boolean if set is empty
Can variables be decorated?
Fast(est) exponentiation of numpy 3D matrix
Removing an element from a list based on a condition
Printing elements of dictionary line by line
Matplotlib does not display the hatch of a patch in a legend
Python win32com - Class not registered error