Fact — measurement — Knowledge Tree

In a retrieval-augmented generation (RAG) system, traces can reveal that 80% of total latency is spent on document retrieval rather than model inference.

Authors

Person: Not available Organization: TTMS
LLM Observability: How to Monitor AI When It Thinks in Tokens | TTMS

Sources

LLM Observability: How to Monitor AI When It Thinks in Tokens | TTMS ttms.com TTMS via serper

Referenced by nodes (2)

Retrieval-Augmented Generation (RAG) concept
latency concept