Relations (1)

cross_type 3.70 — strongly supporting 12 facts

Amazon Bedrock provides a comprehensive suite of tools for evaluating RAG application performance, including knowledge base evaluation features [1], specific dataset formatting requirements [2], and comparative dashboards [3]. Furthermore, Amazon Bedrock supports the implementation of RAG-based chatbot architectures by integrating with data sources like Amazon S3 [4].

Facts (12)

Sources
Evaluating RAG applications with Amazon Bedrock knowledge base ... aws.amazon.com Amazon Web Services 11 facts
claimThe Amazon Bedrock knowledge base evaluation feature allows users to assess RAG application performance by analyzing how different components, such as knowledge base configuration, retrieval strategies, prompt engineering, model selection, and vector store choices, impact metrics.
claimUsers can compare two Amazon Bedrock RAG evaluation jobs using a radar chart to visualize relative strengths and weaknesses across different performance dimensions.
procedureAmazon Bedrock RAG evaluation features support batch analysis rather than real-time monitoring, so users should schedule periodic batch evaluations that align with knowledge base updates and content refreshes.
claimThe Amazon Bedrock RAG evaluation dataset format changed the key 'referenceContexts' to 'referenceResponses' following the end of the Public Preview period on March 20, 2025.
procedureThe procedure to start a knowledge base RAG evaluation job using the Amazon Bedrock console is: (1) Navigate to the Amazon Bedrock console, select 'Evaluations' under 'Inference and Assessment', and choose 'Knowledge Bases'. (2) Select 'Create'. (3) Provide an Evaluation name, Description, and select an Evaluator model to act as a judge. (4) Choose the knowledge base and the evaluation type (either 'Retrieval only' or 'Retrieval and response generation'). (5) Select a model for generating responses. (6) Optionally configure inference parameters such as temperature, top-P, prompt templates, guardrails, search strategy, and chunk counts. (7) Provide the S3 URI for evaluation data and results. (8) Select or create an IAM role with permissions for Amazon Bedrock, S3 buckets, the knowledge base, and the models. (9) Select 'Create' to initiate the job.
procedureAmazon Bedrock RAG evaluation jobs output results to a directory in Amazon S3, which can be located via the job results page in the evaluation summary section.
procedureAmazon Bedrock RAG evaluation best practices include designing evaluation strategies using representative test datasets that reflect production scenarios and user patterns.
procedureAmazon Bedrock RAG evaluation allows users to select specific score ranges in a histogram to view detailed conversation analyses, including the input prompt, generated response, number of retrieved chunks, ground truth comparison, and the evaluator model's score explanation.
measurementAmazon Bedrock RAG evaluation jobs typically take 10–15 minutes for a small job, while a large job with hundreds of long prompts and all metrics selected can take a few hours.
claimAmazon Bedrock RAG evaluation provides a comparative dashboard that includes a completeness histogram, which visualizes how well AI-generated responses cover all aspects of the questions asked.
claimIn Amazon Bedrock RAG evaluations, the 'referenceResponses' field must contain the expected ground truth answer that an end-to-end RAG system should generate for a given prompt, rather than the expected passages or chunks retrieved from the Knowledge Base.
Reducing hallucinations in large language models with custom ... aws.amazon.com Amazon Web Services 1 fact
procedureThe RAG-based chatbot solution architecture involves the following steps: (1) Data ingestion involving raw PDFs stored in an Amazon Simple Storage Service (Amazon S3) bucket synced as a data source with Amazon Bedrock Knowledge Bases; (2) The user asks a question; (3) The Amazon Bedrock agent creates a plan and identifies the need to use a knowledge base; (4) The agent sends a request to the knowledge base, which retrieves relevant data from the underlying vector database (Amazon OpenSearch Serverless); (5) The agent retrieves an answer through RAG.