reference
System-level explainability is a post-hoc technique that interprets the attention mechanisms of language models without affecting their learning process by connecting attention patterns to concepts from understandable knowledge repositories.

Authors

Sources

Referenced by nodes (2)