TechRxiv

Transformers: Statistical Interpretation, Architectures and Applications

Download (2.03 MB)
preprint
posted on 2023-12-07, 03:30 authored by Fanfei MengFanfei Meng, Yuxin Wang

Transformers have been widely recognized as powerful tools to analyze multiple tasks due to its state-of art multi-head attention spaces, such as Natural Language Processing (NLP), Computer Vision (CV) and Speech Recognition (SR). Inspired by its abundant designs and strong functions on analyzing input data, we would like to start from the various architectures, further proceed to the investigation on its statistical mechanism and inference and then introduce its applications on dominant tasks. The underlying statistical mechanisms arouse our interests and intrigue us to investigate it in a higher level, and this surveys will focus on its mathematical foundations and then use the principles to try to analyze the reasons for its excellent performance on many recognition scenarios.

History

Email Address of Submitting Author

fanfeimeng2023@u.northwestern.edu

Submitting Author's Institution

Northwestern University

Submitting Author's Country

  • United States of America

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC