2021年12月3日

# Method

z_t,z_s，通过与聚类簇K∈{c_1, c_2, ... c_k}的计算得到code记为q_t,q_s，
L(z_t, z_s) = loss(z_t, q_s) + loss(z_s, q_t)

# 网络原理

### P

τ_s > 0 a temperature parameter that controls the sharpness of the output distribution

### Loss

We follow the data augmentations of BYOL (color jittering, Gaussian blur and solarization) and multi-crop with a bicubic interpolation.

Given image x, we generate a set V of different views, while x_1, x_2 in V are global views and others x’ are local views by using multi-crop.

### Centering

Centering prevents one dimension to dominate but encourages collapse to the uniform distribution, while the sharpening has the opposite effect.

### Teacher network

We build it from past iterations of the student network and study different update rules.

### Network architecture

Backbone: ViT or ResNet + a projection head + l2norm + fc(K dimensions)

The projection head starts with a n-layer(3) multilayer perceptron (MLP). The hidden layers are 2048d and are with gaussian error linear units (GELU) activations.

# 网络效果

For linear evaluations, we apply random resize crops and horizontal flips augmentation during training, and report accuracy on a central crop.

For finetuning evaluations, we initialize networks with the pretrained weights and adapt them during training. For k-nn evaluations, we freeze the pretrain model to compute and store the features of the training data of the downstream task. The nearest neighbor classifier then matches the feature of an image to the k(=20) nearest stored features that votes for the label.

# 总结

k-NN分类中特征的质量有可能用于图像检索

