Different from traditional neural entworks that deal majorly with independent data, Recurrent Neural Networks (RNN) take data dependence into account. They are networks with loops in them, allowing information to persist. It is a natural architecture of neural network to use for sequential data.
Long Short Term Memory networks (LSTMs) is a very special kind of recurrent neural network which works, for many tasks, much much better than the standard version. Almost all exciting results based on recurrent neural networks are achieved with them.
The main idea of LSTM is consider both short- and long-term dependencies. Traditional NNs have no problems learning short-term memory. Why do traditional NNs have difficulty in learning long-term dependencies? Bengio, et al. (1994) explored in depth and found some pretty fundamental reasons.
Variant of LSTMs.
d. Greff, et al. (2015) do a nice comparison of popular variants, finding that they’re all about the same. Jozefowicz, et al. (2015) tested more than ten thousand RNN architectures, finding some that worked better than LSTMs on certain tasks.
b. Grid LSTMs
LSTMs are explicitly designed to avoid the long-term dependency problem. Remembering information for long periods of time is practically their default behavior, not something they struggle to learn!
All recurrent neural networks have the form of a chain of repeating modules of neural network. In standard RNNs, this repeating module will have a very simple structure, such as a single tanh layer.
 [Reccurrent Neural Network Regulation]
In addition to the original authors, a lot of people contributed to the modern LSTM. A non-comprehensive list is: Felix Gers, Fred Cummins, Santiago Fernandez, Justin Bayer, Daan Wierstra, Julian Togelius, Faustino Gomez, Matteo Gagliolo, and Alex Graves. ↩