entity

Amazon Bedrock

Also known as: Bedrock

Facts (35)

Sources

Evaluating RAG applications with Amazon Bedrock knowledge base ... aws.amazon.com Amazon Web Services Mar 14, 2025 23 facts

claimAmazon Bedrock knowledge base RAG evaluation enables organizations to deploy and maintain high-quality RAG applications by providing automated assessment of both retrieval and generation components.

procedureTo use the Amazon Bedrock knowledge base evaluation feature, users must have an active AWS account, enabled evaluator and generator models in Amazon Bedrock, confirmed AWS Region availability and quotas, configured AWS Identity and Access Management (IAM) permissions for an S3 bucket, enabled CORS on the S3 bucket, and created an Amazon Bedrock knowledge base with synced data.

claimThe Amazon Bedrock Evaluation details tab allows users to examine score distributions through histograms for each evaluation metric, displaying average scores and percentage differences.

procedureWhen using a custom model for a generator in Amazon Bedrock knowledge base evaluation, users must ensure sufficient quota for Provisioned Throughput, specifically checking 'Model units no-commitment Provisioned Throughputs across custom models' and 'Model units per provisioned model' in the Service Quotas console.

measurementIn an example completeness histogram provided by Amazon Bedrock, the distribution was right-skewed with an average score of 0.921, where 15 responses scored above 0.9 and a small number scored between 0.5 and 0.8.

claimThe Amazon Bedrock knowledge base evaluation feature allows users to assess RAG application performance by analyzing how different components, such as knowledge base configuration, retrieval strategies, prompt engineering, model selection, and vector store choices, impact metrics.

referenceAmazon Bedrock knowledge base evaluation assesses quality through four dimensions: technical quality (context relevance and faithfulness), business alignment (correctness and completeness), user experience (helpfulness and logical coherence), and responsible AI metrics (harmfulness, stereotyping, and answer refusal).

claimAmazon Bedrock launched two evaluation capabilities: LLM-as-a-judge (LLMaaJ) under Amazon Bedrock Evaluations and a RAG evaluation tool for Amazon Bedrock Knowledge Bases.

accountAdewale Akinfaderin is a Senior Data Scientist for Generative AI at Amazon Bedrock, specializing in reproducible and end-to-end AI/ML methods.

claimUsers can compare two Amazon Bedrock RAG evaluation jobs using a radar chart to visualize relative strengths and weaknesses across different performance dimensions.

procedureAmazon Bedrock RAG evaluation features support batch analysis rather than real-time monitoring, so users should schedule periodic batch evaluations that align with knowledge base updates and content refreshes.

claimThe Amazon Bedrock RAG evaluation dataset format changed the key 'referenceContexts' to 'referenceResponses' following the end of the Public Preview period on March 20, 2025.

procedureThe procedure to start a knowledge base RAG evaluation job using the Amazon Bedrock console is: (1) Navigate to the Amazon Bedrock console, select 'Evaluations' under 'Inference and Assessment', and choose 'Knowledge Bases'. (2) Select 'Create'. (3) Provide an Evaluation name, Description, and select an Evaluator model to act as a judge. (4) Choose the knowledge base and the evaluation type (either 'Retrieval only' or 'Retrieval and response generation'). (5) Select a model for generating responses. (6) Optionally configure inference parameters such as temperature, top-P, prompt templates, guardrails, search strategy, and chunk counts. (7) Provide the S3 URI for evaluation data and results. (8) Select or create an IAM role with permissions for Amazon Bedrock, S3 buckets, the knowledge base, and the models. (9) Select 'Create' to initiate the job.

claimOrganizations can make data-driven decisions about RAG implementations and follow responsible AI practices through the integration of Amazon Bedrock knowledge base evaluation with Amazon Bedrock Guardrails.

procedureAmazon Bedrock RAG evaluation jobs output results to a directory in Amazon S3, which can be located via the job results page in the evaluation summary section.

procedureAmazon Bedrock RAG evaluation best practices include designing evaluation strategies using representative test datasets that reflect production scenarios and user patterns.

procedureAmazon Bedrock RAG evaluation allows users to select specific score ranges in a histogram to view detailed conversation analyses, including the input prompt, generated response, number of retrieved chunks, ground truth comparison, and the evaluator model's score explanation.

accountJesse Manders is a Senior Product Manager on Amazon Bedrock, the AWS Generative AI developer service, with an academic background including an M.S. and Ph.D. from the University of Florida and an MBA from the University of California, Berkeley, Haas School of Business.

measurementThe input dataset for an Amazon Bedrock knowledge base evaluation job must be in JSONL format, stored in an S3 bucket with CORS enabled, contain a maximum of 1,000 conversations per job, and have a maximum of 5 turns (prompts) per conversation.

measurementAmazon Bedrock RAG evaluation jobs typically take 10–15 minutes for a small job, while a large job with hundreds of long prompts and all metrics selected can take a few hours.

claimAmazon Bedrock RAG evaluation provides a comparative dashboard that includes a completeness histogram, which visualizes how well AI-generated responses cover all aspects of the questions asked.

claimIn Amazon Bedrock RAG evaluations, the 'referenceResponses' field must contain the expected ground truth answer that an end-to-end RAG system should generate for a given prompt, rather than the expected passages or chunks retrieved from the Knowledge Base.

claimThe evaluation features in Amazon Bedrock enable organizations to assess AI model outputs across various tasks, evaluate multiple performance dimensions simultaneously, systematically assess retrieval and generation quality in RAG systems, and scale evaluations across thousands of responses.

Reducing hallucinations in large language models with custom ... aws.amazon.com Amazon Web Services Nov 26, 2024 10 facts

procedureThe cleanup process for the Amazon Bedrock Agents hallucination detection infrastructure follows this specific order: disable the action group, delete the action group, delete the alias, delete the agent, delete the Lambda function, empty the S3 bucket, delete the S3 bucket, delete AWS Identity and Access Management (IAM) roles and policies, delete the vector database collection policies, and delete the knowledge bases.

claimAgentic workflows within Amazon Bedrock can be extended to custom use cases for detecting and mitigating hallucinations through the use of custom actions.

claimAmazon Bedrock model evaluation includes a human-based evaluation feature that allows customers to perform batch evaluation of LLM outputs with human reviewers.

claimAmazon Bedrock is a fully managed service that provides access to foundation models from AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API.

claimAmazon Bedrock Guardrails offer hallucination detection with contextual grounding checks that can be applied using Amazon Bedrock APIs such as Converse or InvokeModel.

procedureThe RAG-based chatbot solution architecture involves the following steps: (1) Data ingestion involving raw PDFs stored in an Amazon Simple Storage Service (Amazon S3) bucket synced as a data source with Amazon Bedrock Knowledge Bases; (2) The user asks a question; (3) The Amazon Bedrock agent creates a plan and identifies the need to use a knowledge base; (4) The agent sends a request to the knowledge base, which retrieves relevant data from the underlying vector database (Amazon OpenSearch Serverless); (5) The agent retrieves an answer through RAG.

claimThe solution implementation uses Anthropic’s Claude v3 (Sonnet) and Amazon Titan Embeddings Text v2 hosted on Amazon Bedrock.

claimThe Amazon Bedrock Agents implementation for hallucination reduction incurs no separate charges for building resources using Amazon Bedrock Knowledge Bases or Amazon Bedrock Agents, but users are charged for embedding model and text model invocations on Amazon Bedrock, as well as for Amazon S3 and vector database usage.

claimAmazon Bedrock supports foundation models from various providers, including Anthropic (Claude models), AI21 Labs (Jamba models), Cohere (Command models), Meta (Llama models), and Mistral AI.

claimHallucination detection workflows in Amazon Bedrock can be implemented using Amazon Bedrock Prompt Flows or custom logic using AWS Lambda functions.

A Comprehensive Review of Neuro-symbolic AI for Robustness ... link.springer.com Springer Dec 9, 2025 1 fact

claimAmazon Bedrock’s LLM Guardrails use formal rules to check and adjust large language model outputs, acting as a symbolic intervention layer that performs automated reasoning to override or reject responses that violate safety constraints.

How Neuro-Symbolic AI Breaks the Limits of LLMs - WIRED wired.com Wired 1 fact

claimAmazon Nova 2 Lite, available in Amazon Bedrock, demonstrates stronger performance in mathematics, coding, and science benchmarks by combining neural learning with symbolic reasoning.