measurement
In the DROP dataset application, the Trustworthy Language Model (TLM) exhibited the best performance for hallucination detection, followed by improved RAGAS metrics and LLM Self-Evaluation.

Authors

Sources

Referenced by nodes (3)