Skip to content

🦖 DINO

The DINO implementation in ViT-SSL is modular and educational by design. It follows the original DINO paper.


Overview

DINO (Self-Distillation with No Labels) uses a teacher–student framework to learn representations without supervision. The teacher is an EMA (exponential moving average) of the student. Training aims to align their output distributions using softmax cross-view consistency.


Architecture

  • Defined in vit_core/ssl/dino/model.py
  • Composed of:
    • ViTBackbone: shared ViT encoder used for both student and teacher
    • DINOHead: MLP head with optional normalization and projection
  • Initializes:
    • student_backbone ← trainable
    • teacher_backbone ← frozen copy, updated with momentum
    • Separate heads for student and teacher
    • Center buffer for output normalization (Eq. 4 in the paper)
teacher_output = teacher_head(teacher_backbone(x))
student_output = student_head(student_backbone(x))

Teacher outputs are updated via momentum_update_teacher() using a scheduled momentum.

Forward Pass

def forward(multi_crop_views, num_global_views):
    student_input = torch.cat(all_views)
    teacher_input = torch.cat(global_views)

    student_output = student(student_input)
    teacher_output = teacher(teacher_input)

    return teacher_output, student_output

Loss: DINOLoss

  • Implements Equation 1 from the DINO paper
  • Applies temperature scaling + centering to the teacher logits
  • Uses cross-view prediction: student tries to predict teacher output from different views
  • Cross entropy is computed between softmaxed teacher and log-softmaxed student outputs:

loss = -(softmax(teacher) * log_softmax(student)).sum().mean()

Defined in vit_core/ssl/dino/loss.py

Training: DINOTrainer

  • Inherits from a generic BaseTrainer
  • Implements:

    • create_criterion(): builds the DINOLoss
    • train_epoch(): training logic, view reshaping, loss calc, teacher update, warmup
    • validate(): similar logic without gradient computation
  • Highlights

    • Teacher momentum is scheduled with a cosine scheduler via DINOMomentumScheduler
    • Both teacher and student outputs are reshaped per view before computing the loss
    • Centering is updated at every step as per DINO's original formulation

Modular Design

Component File Role
DINOViT model.py Dual backbone + head w/ EMA update
DINOHead head.py Nonlinear projection head
DINOLoss loss.py Self-distillation loss
DINOMomentumScheduler dino_utils.py Momentum scheduler
DINOTrainer trainer.py Full training loop