FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding

Published in arXiv preprint arXiv:2504.09925, 2025

Recommended citation: Liu, Zheng; Liu, Mengjie; Chen, Jingzhou; Xu, Jingwei; Cui, Bin; He, Conghui; Zhang, Wentao;. (2025). FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding. arXiv preprint arXiv:2504.09925