骨干网络 – 我家Ai智障

27

12月

2023

1,868 0

[略读]Emu2

Generative Multimodal Models are In-Context Learners Abstract 人类能够（仅需少量演示或简单指示即可）轻松解决多模态任务的能力，是当前多模态...

24

11月

2023

2,048 0

ABSTRACT 本文揭示了大型语言模型（LLMs）尽管仅在文本数据上进行训练，但在没有语言的情况下，它们仍然是纯视觉任务的强大编码器。更有趣的是，这可以通过一种简单但以前被忽视的策略实现 &#821...

12

5月

2021

1,887 0

Twins: Revisiting the Design of Spatial Attention in Vision Transformers Conditional Positional Enco...

10

5月

2021

6,535 1

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows 主要贡献： Patch Merging Layer Sh...

20

4月

2021

2,033 0

A Versatile Backbone for Dense Prediction without Convolutions Abstract 尽管使用CNN作为骨干网络的结构在视觉领域取得巨大成功，...