procedure
In coarse-grained multi-dimension Image-Report Generation (IRG) scenarios, Large Vision-Language Model (LVLM) outputs are segmented into sentences and annotated at the sentence level.

Authors

Sources

Referenced by nodes (2)