procedure
Generative Evaluator Tuning is a method that uses reinforcement learning to train e-LLMs by combining traditional training with rewards from KnowLLMs, which act as extra guidelines. If an e-LLM's output is logically incorrect according to KnowLLM or fails to meet specific criteria, it receives negative rewards, even if the output is similar to the ground truth based on similarity scores.

Authors

Sources

Referenced by nodes (1)