What Does the Transformer Structure Inform Us? | by Stephanie Shen | Jul, 2024

[ad_1]

Picture by narciso1 from Pixabay

The stellar efficiency of enormous language fashions (LLMs) akin to ChatGPT has shocked the world. The breakthrough was made by the invention of the Transformer structure, which is surprisingly easy and scalable. It’s nonetheless constructed of deep studying neural networks. The primary addition is the so-called “consideration” mechanism that contextualizes every phrase token. Furthermore, its unprecedented parallelisms endow LLMs with large scalability and, subsequently, spectacular accuracy after coaching over billions of parameters.

The simplicity that the Transformer structure has demonstrated is, in reality, similar to the Turing machine. The distinction is that the Turing machine controls what the machine can do at every step. The Transformer, nevertheless, is sort of a magic black field, studying from large enter knowledge by parameter optimizations. Researchers and scientists are nonetheless intensely all for discovering its potential and any theoretical implications for learning the human thoughts.

On this article, we are going to first focus on the 4 predominant options of the Transformer structure: phrase embedding, consideration mechanism, single-word prediction, and generalization capabilities akin to multi-modal extension and transferred studying. The intention is to deal with why the structure is so efficient as a substitute of the right way to construct it (for which readers can discover many…

[ad_2]
Stephanie Shen
2024-07-25 16:45:13
Source hyperlink:https://towardsdatascience.com/what-does-the-transformer-architecture-tell-us-cd3a4fd6a59d?source=rss—-7f60cf5620c9—4

Similar Articles

Comments

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular