Transformers Introduction

Encoder-Decoder Framework

Before the advent of Transformer models, Recurrent Neural Networks(RNNs) were the leading performers in natural language processing (NLP)
Transformers_NLP1
Figure: RNN unfolded along the time axis
-The Encoder-Decoder architecture is particularly effective for sequential data modeling such as in text processing
In this setup, the network Receives an input > Processes it > Re-inputs a part of its output back into itself, creating a feedback loop
This mechanism allows RNNs to use a portion of the previously generated output in the subsequent step
-Additionally, RNNs transfer state information from one step to the next
This process helps in keeping track of the information from previous steps, which is crucial for making accurate predictions in sequential data tasks
-“Recurrent Neural Networks(RNNs) have been instrumental in advancing the field of machine translation
Involves mapping one language to another
Transformers_NLP2
Figure: Illustrates an Encoder-Decoder structure, which is comprised of a pair of Recurrent Neural Networks (RNNs)
(Note that in actual applications, this structure typically involves many more recurrent layers than are depicted in this figure)
-Process is typically carried out using an Encoder-Decoder or Sequence-to-Sequence architecture
(Especially when the Input and Output lengths are variable)
Encoder:
Responsible for encoding the input sequence into a numerical representation which is captured in the final hidden state
Decoder:
Takes this final hidden state and generates the output sequence
This collaborative functioning of the Encoder and Decoder allows for effective translation between languages in ML models

Reference:
https://seungseop.tistory.com/21