★····论文 – 第 3 页

30

9 月

2022

2,867 0

[多模态]Everything at Once

Multi-modal Fusion Transformer for Video Retrieval Abstract 从视频数据中进行多模态学习最近受到了越来越多的关注，因为它允许在没有人工注释的情...

22

8 月

2022

1,836 0

[略读]ObjectBox

From Centers to Boxes for Anchor-Free Object Detection 主要贡献｜Keypoints 标签分配｜Label Assignment 在三层特征图上预...

21

3 月

2022

2,540 0

[精读]表格问答TAPAS

文献 TAPAS:Weakly Supervised Table Parsing via Pre-training Abatract 通过表格回答自然语言问题通常被视为语义解析任务。为了减轻完整逻辑格...

03

12 月

2021

3,331 2

[精读]DINO

SwAV https://arxiv.org/pdf/2006.09882.pdf DINO https://arxiv.org/pdf/2104.14294.pdf Abstract 无监督图像表示...

23

9 月

2021

3,736 0

[略读]Align before Fuse

Align before Fuse: Vision and Language Representation Learning with Momentum Distillation 背景 VLP（Vis...

04

7 月

2021

2,305 0

[翻译] UNITER：通用图文表示学习

UNiversal Image-TExt Representation Learning Abstract 联合图文嵌入是大多数视觉和语言任务(V+L tasks)的基础，在这些任务中，多模态输入被同...

12

5 月

2021

2,267 0

[略读]Twins系列

Twins: Revisiting the Design of Spatial Attention in Vision Transformers Conditional Positional Enco...

10

5 月

2021

7,052 1

[略读]Swin-Transformer

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows 主要贡献： Patch Merging Layer Sh...

20

4 月

2021

2,485 0

[翻译]Pyramid Vision Transformer

A Versatile Backbone for Dense Prediction without Convolutions Abstract 尽管使用CNN作为骨干网络的结构在视觉领域取得巨大成功，...

05

2 月

2021

2,327 0

[翻译]See Better Before Looking Closer

See Better Before Looking Closer: Weakly Supervised Data Augmentation Network for Fine-Grained Visua...

Category: ★····论文