concept

supervised fine-tuning

Also known as: SFT, Supervised finetuning

Facts (20)

Sources

Hallucination Causes: Why Language Models Fabricate Facts mbrenndoerfer.com M. Brenndoerfer · mbrenndoerfer.com Mar 15, 2026 5 facts

claimSupervised finetuning datasets suffer from a selection bias where annotators write detailed, authoritative responses for topics they know well and shorter, hedged responses for topics where they are less confident, which reinforces model overconfidence on well-known topics.

claimInstruction-following datasets used for supervised finetuning often have thin coverage of rare query types, meaning models receive little practice on the specific queries where they are most likely to hallucinate.

claimSupervised finetuning (SFT) datasets, which are created by human annotators, can introduce factual errors into large language models because human annotators make mistakes, have knowledge gaps, and may produce authoritative-sounding text on topics outside their expertise.

claimLarge language models trained on supervised finetuning data learn the style of confident, well-structured prose because human annotators tend to produce such responses when demonstrating ideal answers.

claimInconsistent annotator knowledge in long-running annotation projects leads to conflicting 'correct' responses within Supervised Fine-Tuning (SFT) datasets, which degrades model calibration and causes inconsistent outputs at inference time.

A Survey on the Theory and Mechanism of Large Language Models arxiv.org arXiv Mar 12, 2026 5 facts

referenceShao et al. (2024) propose a unified paradigm that encompasses Supervised Fine-Tuning (SFT), Rejection Sampling Fine-Tuning (RFT), Direct Preference Optimization (DPO), and Proximal Policy Optimization (PPO), leading to the proposal of Group Relative Policy Optimization (GRPO).

claimZhu et al. (2025c) proved that Reinforcement Learning updates occur in low-curvature subspaces orthogonal to the principal components updated by Supervised Fine-Tuning (SFT), suggesting that Reinforcement Learning operates in a distinct optimization regime that fine-tunes behavior without significantly altering primary feature representations.

claimChu et al. (2025) provided empirical evidence that Supervised Fine-Tuning (SFT) tends to memorize training data, leading to poor performance on out-of-distribution (OOD) tasks, whereas Reinforcement Learning (RL) demonstrates superior generalization capabilities.

procedureThe LLM training process consists of two primary stages: (1) Pre-Training, a massive-scale, self-supervised process where the model optimizes a next-token prediction objective to acquire linguistic knowledge and reasoning abilities; and (2) Supervised Fine-Tuning (SFT), where the pre-trained model is trained on a smaller, high-quality dataset of labeled instruction-response pairs to adapt to human intent.

referenceRen and Sutherland (2025) introduced a framework to analyze learning dynamics during both Supervised Fine-Tuning and alignment phases, suggesting that the alignment process is governed by specific dynamic laws that persist across different algorithms.

Detecting and Evaluating Medical Hallucinations in Large Vision ... arxiv.org arXiv Jun 14, 2024 4 facts

claimSequentially using different task data for Supervised Fine-Tuning (SFT) across multiple stages is unnecessary; instead, mixing different task data in a single SFT phase maximizes the performance enhancement of MediHallDetector.

procedureThe authors of the study conducted ablation studies on MediHallDetector's Supervised Fine-Tuning (SFT) methods, comparing accuracy and recall against human preferences across various hallucination categories.

claimPerforming Supervised Fine-Tuning (SFT) in a single phase with data from three different tasks allows each type of training data to contribute effectively, leading to incremental improvements in MediHallDetector’s performance.

procedureThe MediHallDetector model underwent supervised fine-tuning (SFT) using data from traditional medical visual language tasks, Med-HallMark data, and hallucination detection instruction pair data.

Medical Hallucination in Foundation Models and Their ... medrxiv.org medRxiv Mar 3, 2025 1 fact

claimReinforcement learning from knowledge feedback (RLKF) achieves superior factuality in AI models compared to decoding strategies or supervised fine-tuning.

A Survey of Incorporating Psychological Theories in LLMs - arXiv arxiv.org arXiv 1 fact

claimLi et al. (2023) proposed a working memory approach for Supervised Fine-Tuning (SFT) that dynamically balances stored information with provided contexts.

A Comprehensive Benchmark and Evaluation Framework for Multi ... arxiv.org arXiv Jan 6, 2026 1 fact

claimDirect Preference Optimization (DPO) significantly outperforms Supervised Fine-Tuning (SFT) in handling complex reasoning and emotional nuance in patient agents.

Survey and analysis of hallucinations in large language models frontiersin.org Frontiers Sep 29, 2025 1 fact

claimEfforts to mitigate hallucinations at the model level include supervised fine-tuning (SFT), reinforcement learning from human feedback (RLHF), contrastive decoding, and grounded pretraining.

Detecting hallucinations with LLM-as-a-judge: Prompt ... - Datadog datadoghq.com Aritra Biswas, Noé Vernier · Datadog Aug 25, 2025 1 fact

claimPrompts are used to augment labeled data with reasoning chains for supervised fine-tuning (SFT) or in SFT initialization steps before reinforcement learning (RL).

Practices, opportunities and challenges in the fusion of knowledge ... frontiersin.org Frontiers 1 fact

referenceThe integration of Knowledge Graphs into Large Language Models can be categorized into three types based on the effect of the enhancement: pre-training, reasoning methods (including supervised fine-tuning and alignment fine-tuning), and model interpretability.