Large Language Models ↔ Transformer

Relations (1)

related 0.30 — supporting 3 facts

Large Language Models are fundamentally linked to the Transformer architecture, as evidenced by the definition of prefilling processes within transformer-based models [1] and the development of alternative architectures like the Retentive Network specifically designed to succeed the Transformer in this domain [2]. Furthermore, the evolution of these models is contextualized by comparing current Transformer-based large language models to historical sequence-to-sequence systems [3].

Facts (3)

Sources

Hallucination Causes: Why Language Models Fabricate Facts mbrenndoerfer.com M. Brenndoerfer · mbrenndoerfer.com 1 fact

claimExposure bias is not unique to large language models; it arises in any sequence-to-sequence system trained with teacher forcing, including neural machine translation systems from the pre-transformer era.

A Survey on the Theory and Mechanism of Large Language Models arxiv.org arXiv 1 fact

claimThe research paper 'Retentive network: a successor to transformer for large language models' (arXiv:2307.08621) proposes the Retentive Network as an alternative architecture to the Transformer for large language models.

Track: Poster Session 3 - aistats 2026 virtual.aistats.org Samuel Tesfazgi, Leonhard Sprandl, Sandra Hirche · AISTATS 1 fact

referenceSiyan Zhao, Daniel Israel, Guy Van den Broeck, and Aditya Grover define prefilling in transformer-based large language models as the computation of the key-value (KV) cache for input tokens in the prompt prior to autoregressive generation.