🦖 DINO
The DINO implementation in ViT-SSL is modular and educational by design. It follows the original DINO paper.
Overview
DINO (Self-Distillation with No Labels) uses a teacher–student framework to learn representations without supervision. The teacher is an EMA (exponential moving average) of the student. Training aims to align their output distributions using softmax cross-view consistency.
Architecture
- Defined in
vit_core/ssl/dino/model.py
- Composed of:
ViTBackbone
: shared ViT encoder used for both student and teacherDINOHead
: MLP head with optional normalization and projection
- Initializes:
student_backbone
← trainableteacher_backbone
← frozen copy, updated with momentum- Separate heads for student and teacher
- Center buffer for output normalization (Eq. 4 in the paper)
teacher_output = teacher_head(teacher_backbone(x))
student_output = student_head(student_backbone(x))
Teacher outputs are updated via momentum_update_teacher() using a scheduled momentum.
Forward Pass
def forward(multi_crop_views, num_global_views):
student_input = torch.cat(all_views)
teacher_input = torch.cat(global_views)
student_output = student(student_input)
teacher_output = teacher(teacher_input)
return teacher_output, student_output
Loss: DINOLoss
- Implements Equation 1 from the DINO paper
- Applies temperature scaling + centering to the teacher logits
- Uses cross-view prediction: student tries to predict teacher output from different views
- Cross entropy is computed between softmaxed teacher and log-softmaxed student outputs:
loss = -(softmax(teacher) * log_softmax(student)).sum().mean()
Defined in vit_core/ssl/dino/loss.py
Training: DINOTrainer
- Inherits from a generic BaseTrainer
-
Implements:
- create_criterion(): builds the DINOLoss
- train_epoch(): training logic, view reshaping, loss calc, teacher update, warmup
- validate(): similar logic without gradient computation
-
Highlights
- Teacher momentum is scheduled with a cosine scheduler via DINOMomentumScheduler
- Both teacher and student outputs are reshaped per view before computing the loss
- Centering is updated at every step as per DINO's original formulation
Modular Design
Component | File | Role |
---|---|---|
DINOViT |
model.py |
Dual backbone + head w/ EMA update |
DINOHead |
head.py |
Nonlinear projection head |
DINOLoss |
loss.py |
Self-distillation loss |
DINOMomentumScheduler |
dino_utils.py |
Momentum scheduler |
DINOTrainer |
trainer.py |
Full training loop |