measurement
In a retrieval-augmented generation (RAG) system, traces can reveal that 80% of total latency is spent on document retrieval rather than model inference.
Authors
Sources
- LLM Observability: How to Monitor AI When It Thinks in Tokens | TTMS ttms.com via serper
Referenced by nodes (2)
- Retrieval-Augmented Generation (RAG) concept
- latency concept