claim
The evaluation of generated text by Large Language Models is inconsistent and unreliable, as it is difficult to achieve consistent results between human judgments and automatic evaluation tools, and models themselves can be biased based on their training data.
Authors
Sources
- A survey on augmenting knowledge graphs (KGs) with large ... link.springer.com via serper
Referenced by nodes (2)
- Large Language Models concept
- human evolution concept