Large Language Models ↔ probability distribution

Relations (1)

related 2.32 — strongly supporting 4 facts

Large Language Models rely on a probability distribution over their vocabulary to select tokens during generation, as described in [1] and [2]. This distribution can be directly manipulated or constrained to influence model performance and output formatting, as detailed in [3] and [4].

Facts (4)

Sources

Detecting hallucinations with LLM-as-a-judge: Prompt ... - Datadog datadoghq.com Aritra Biswas, Noé Vernier · Datadog 2 facts

claimDirectly manipulating the probability distribution of generated tokens in Large Language Models can negatively impact the model's performance and accuracy.

procedureLarge Language Models (LLMs) can be constrained to specific output formats by combining the Finite State Machine's (FSM) list of valid tokens with the model's probability distribution and setting the logprob or logit of invalid tokens to negative infinity.

Hallucination Causes: Why Language Models Fabricate Facts mbrenndoerfer.com M. Brenndoerfer · mbrenndoerfer.com 2 facts

claimThe generation process in large language models introduces pressure to favor fluent hallucination over honest uncertainty because the process is a sequence of probability distributions where the model must select a token at each step, and the model lacks a mechanism to output 'I don't know'.

claimLarge language models generate the most statistically plausible answer to questions implying a factual answer exists, rather than expressing uncertainty, because their probability distribution over vocabulary always has a mode and lacks probability mass for an 'abstain' option.