Large Language Models ↔ Pre-training

Relations (1)

related 13.00 — strongly supporting 13 facts

Pre-training is the foundational stage of the Large Language Model development process [1], during which models acquire the core knowledge, algorithmic capabilities, and semantic priors necessary for subsequent tasks {fact:4, fact:5, fact:7, fact:13}.

Facts (13)

Sources

A Survey on the Theory and Mechanism of Large Language Models arxiv.org arXiv 6 facts

perspectiveThe 'Representation Camp' perspective posits that Large Language Models (LLMs) store memories about various topics during pretraining, and in-context learning retrieves contextually relevant topics during inference based on demonstrations.

claimWei et al. (2023) observed that smaller large language models primarily rely on semantic priors from pretraining during in-context learning (ICL) and often disregard label flips in the context, whereas larger models demonstrate the capability to override these priors when faced with label flips.

referenceQian et al. (2024) found that concepts related to trustworthiness become linearly separable early in the pre-training phase of Large Language Models, as revealed by applying linear probing techniques to intermediate checkpoints.

claimLarge language models do not learn new tasks during in-context learning (ICL); instead, they use demonstration information to locate tasks or topics, while the ability to perform tasks is learned during pretraining.

referenceThe paper 'Towards tracing trustworthiness dynamics: revisiting pre-training period of large language models' (arXiv:2402.19465) investigates the trustworthiness dynamics of large language models during their pre-training phase.

perspectiveThe 'Algorithmic Camp' perspective posits that Large Language Models learn to execute algorithms during pre-training and subsequently execute those algorithms for different tasks during in-context learning inference, as argued by Li et al. (2023a), Zhang et al. (2023), and Bai et al. (2023b).

Do LLMs Build World Representations? Probing Through ... neurips.cc NeurIPS 1 fact

claimFine-tuning and advanced pre-training strengthen the tendency of large language models to maintain goal-oriented abstractions during decoding, which prioritizes task completion over the recovery of the world's state and dynamics.

A Survey of Incorporating Psychological Theories in LLMs - arXiv arxiv.org arXiv 1 fact

claimPost-training in Large Language Models (LLMs) refines models from general proficiency to task-specific, goal-oriented behavior after the foundational knowledge is acquired during pre-training.

The Synergy of Symbolic and Connectionist AI in LLM-Empowered ... arxiv.org arXiv 1 fact

procedureThe training process for Large Language Models (LLMs) generally consists of two stages: pre-training and fine-tuning.

The construction and refined extraction techniques of knowledge ... nature.com Nature 1 fact

claimLarge-scale pre-trained Large Language Models (LLMs) such as GPT-4 and LLaMA-3 utilize large-scale pretraining and task-specific fine-tuning to achieve cross-task generalization.

Medical Hallucination in Foundation Models and Their ... medrxiv.org medRxiv 1 fact

claimTargeted knowledge integration during pretraining can reduce blind spots in Large Language Models (LLMs), though maintaining up-to-date domain coverage remains a challenge (Feng et al., 2024).

Practices, opportunities and challenges in the fusion of knowledge ... frontiersin.org Frontiers 1 fact

referenceThe integration of Knowledge Graphs into Large Language Models can be categorized into three types based on the effect of the enhancement: pre-training, reasoning methods (including supervised fine-tuning and alignment fine-tuning), and model interpretability.

Hallucination Causes: Why Language Models Fabricate Facts mbrenndoerfer.com M. Brenndoerfer · mbrenndoerfer.com 1 fact

claimFinetuning large language models modifies the model's response style regarding expressed confidence, but the underlying knowledge gaps and exposure bias patterns remain encoded in the base model from pretraining.