perspective
Using LLaMA-3.1-70B as the sole evaluation model in the HalluLens benchmark raises concerns about bias, particularly when the benchmark is used to judge other LLaMA variants.

Authors

Sources

Referenced by nodes (1)