claim
Model distillation can be used to create smaller, faster generator models that maintain the quality of larger models for specific RAG use cases requiring high performance and lower latency.
Authors
Sources
- Evaluating RAG applications with Amazon Bedrock knowledge base ... aws.amazon.com via serper