2,041 0
[略读]Align before Fuse
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation 背景 VLP(Vis...
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation 背景 VLP(Vis...
UNiversal Image-TExt Representation Learning Abstract 联合图文嵌入是大多数视觉和语言任务(V+L tasks)的基础,在这些任务中,多模态输入被同...