Can any one tell me why LSTM is called as Long and Short both type of memory? I know in LSTM, they stored the some amount of data of their previous state. But if it is stored as short time,then why it is called as a Long Term memory, and if it is stored data as a Long Time then why it is called as Short Term memory. It's Confusing !
Long Short-Term Memory means storing Short-Term data over Long periods of time.
Think of for example a piece of text. "Barnie is a big red dog, with little ears and a long black tail. He is 12 years old". If your task was to figure out what "He" refers to in the second sentence, you would send this data into an LSTM network, and it would analyze each word individually. The calculations for a single word is the Short-Term Memory. However the calculations of each word (the hidden state), like you say, is passed on and included when analyzing the next word. LSTM networks improve on standard RNNs by being able to store this data for many states, therefore storing the Short-Term data (calculations of individual word) over Long periods of time (passing the hidden states to the next word).
Probably a normal RNN could handle the above example, but if you instead input a text of 100 words, a normal RNN would not be able to store all the data, leading to vanishing or exploding gradient. So RNNs are able to store Short-Term data, just like LSTMs, it is just LSTMs can do it over a longer period (usually of time).