Model Lineage: V1 through V9

The student-model training program comprises nine configurations spanning two backbones, three dataset mixtures, and two loss formulations. Each configuration is documented on a dedicated page describing the experimental variable introduced, the quantitative results, the operational disposition, and the findings supported by the data.

Configuration Summary

# Codename Title Backbone NYU val RMSE NYU val mIoU Femto Bolt corridor LILocBench corridor Status
V1 Compass Initial Distillation Baseline MobileNetV3-Small 75.37 m Baseline
V2 Sextant Loss Weighting Diagnostic MobileNetV3-Small Diagnostic
V3 Anchor Recipe Rewrite (Metric Teacher) MobileNetV3-Small 1.160 m 39.3 % Working
V4 Pivot EfficientViT-B1 Encoder EfficientViT-B1 0.774 m 51.0 % 1.373 m Working
V5 Atlas Augmentation Pipeline EfficientViT-B1 0.572 m 63.7 % 2.186 m Production (general)
V6 Cornerstone Multi-Domain Pretraining EfficientViT-B1 0.519 m 48.5 % 2.158 m Production (fine-tune base)
V7 Tunnel Fine-Tuning from V5 EfficientViT-B1 1.315 m 47.5 % 1.982 m 0.445 m Superseded
V8 Confluence Joint-Domain Training EfficientViT-B1 0.592 m 62.9 % 2.266 m Ablation
V9 Lighthouse Corridor-Specialized Student EfficientViT-B1 1.553 m 31.6 % 1.589 m 0.382 m Production (corridor)

HuggingFace releases

The three production checkpoints are released as separate HuggingFace model repositories under the NishantPushparaju/vortex-depth-* family. The pre-production configurations (V1-V4, V7, V8) are retained in the project’s hpc_outputs/ directory for paper-side reproducibility but are not released individually on HuggingFace.

Production checkpoint HuggingFace identifier Use case
V5 — Atlas NishantPushparaju/vortex-depth-v5-general General-purpose indoor depth estimation across diverse room geometries
V6 — Cornerstone NishantPushparaju/vortex-depth-v6-pretrained Fine-tuning base for additional domain specialists
V9 — Lighthouse NishantPushparaju/vortex-depth-v9-corridor Production corridor specialist; closed-loop validated against ground-truth depth in simulation

LILocBench corridor RMSE is measured on the Intel RealSense D455 sensor (the LILocBench reference camera). Femto Bolt corridor RMSE is measured on the Orbbec Femto Bolt deployment camera. The two measurements use distinct intrinsics and noise profiles and are not directly comparable.

Lineage Diagram

The nine configurations form a directed graph reflecting the initialization-checkpoint dependencies. Sequential improvements (V1 → V5) preceded the branching specialization phase (V5 → V6, V7, V8; V6 → V9).

flowchart TB V1["V1: Initial Distillation Baseline
(relative-depth teacher)
NYU RMSE: 75.37 m"] V2["V2: Loss Weighting Diagnostic"] V3["V3: Recipe Rewrite
(metric-scale teacher)
NYU RMSE: 1.160 m"] V4["V4: EfficientViT-B1 Encoder
NYU RMSE: 0.774 m"] V5["V5: Augmentation Pipeline
NYU RMSE: 0.572 m
(production: general indoor)"] V6["V6: Multi-Domain Pretraining
NYU RMSE: 0.519 m
(production: fine-tune base)"] V7["V7: Fine-Tuning from V5
LILocBench: 0.445 m
NYU regression: +130 %"] V8["V8: Joint-Domain Training
(Pareto-dominated by V5 and V9)"] V9["V9: Corridor-Specialized Student
LILocBench: 0.382 m
9 / 10 Gazebo success
(production: corridor)"] V1 --> V2 V2 --> V3 V3 -->|"backbone substitution"| V4 V4 -->|"augmentation pipeline added"| V5 V5 -->|"recipe inherited
+ multi-domain pretrain"| V6 V5 -->|"checkpoint init
+ corridor fine-tune"| V7 V5 -->|"checkpoint init
+ joint training"| V8 V6 -->|"checkpoint init
+ corridor fine-tune"| V9 style V1 fill:#fde2e2 style V2 fill:#fff3cd style V3 fill:#e8f0ff style V4 fill:#e8f0ff style V5 fill:#d4e7c5 style V6 fill:#d4e7c5 style V7 fill:#fff3cd style V8 fill:#fde2e2 style V9 fill:#d4e7c5

Outcomes by Category

Category Configurations
Production deployment V5 (general indoor), V6 (fine-tune base), V9 (corridor specialist)
Sequential improvement V3, V4, V5
Specialization branch V7, V9
Diagnostic / ablation (negative result) V2, V8
Foundational baseline (superseded) V1

Findings by Configuration

The training program supports the following findings, each grounded in the experimental record of the corresponding configuration:

# Finding
V1 Teacher output space must be expressed in the units used for evaluation. Unit-space inconsistency dominates any signal that loss formulation or model capacity could provide.
V2 Multi-task loss weighting cannot compensate for misaligned supervision. Bounded weighting (log σ² clamped to [-2, 2]) is necessary to prevent task-collapse but is not sufficient for accurate prediction.
V3 The combination of metric-scale teacher, berHu loss, Kendall multi-task weighting, and two-rate optimizer constitutes a sufficient training recipe for metric depth estimation. The recipe was set at V3 and held constant through V9.
V4 Encoder capacity remains a binding constraint at fixed training recipe. EfficientViT-B1 over MobileNetV3-Small produces a 33 % NYU RMSE reduction with no other change.
V5 Overfitting precedes capacity. Before evaluating larger architectures, augmentation should be exercised against the existing baseline. NYU validation accuracy and corridor deployment accuracy are not monotonically correlated.
V6 Multi-domain pretraining transfers to downstream specialization quality. Mixed-dataset training requires explicit handling of supervision gaps to avoid loss-function edge cases.
V7 Single-domain fine-tuning produces corridor-class specialists at the cost of general-domain capability. Specialization quality depends on the initialization checkpoint, not solely on the fine-tuning protocol.
V8 Joint-domain training does not substitute for sequential pretrain-then-specialize when the source and target domains differ at the geometric level under the project’s capacity-distribution coverage ratio.
V9 A 5.31 × 10⁶ parameter student produced via the V6 → corridor fine-tuning pipeline achieves closed-loop navigation parity with ground-truth depth in simulation (9 / 10 success at 10 seeds, 0 collisions).

Selection Guidance

Use case Recommended checkpoint
General-purpose indoor depth estimation V5
Production corridor specialist V9
Fine-tuning base for additional domain specialists V6
Foundation-model zero-shot inference at maximum throughput on Jetson DA3-Small (218 FPS / 4.6 ms / 2.7 GB on Jetson Orin Nano TensorRT FP16) — not part of this lineage

The two production deployment models are V5 (general indoor) and V9 (corridor specialist). V9 is the recommended checkpoint for the deployment environment documented throughout this technical report; V5 is the recommended checkpoint for unconstrained indoor scenes.