Model Lineage: V1 through V9

The student-model training program comprises nine configurations spanning two backbones, three dataset mixtures, and two loss formulations. Each configuration is documented on a dedicated page describing the experimental variable introduced, the quantitative results, the operational disposition, and the findings supported by the data.

Configuration Summary

#	Codename	Title	Backbone	NYU val RMSE	NYU val mIoU	Femto Bolt corridor	LILocBench corridor	Status
V1	Compass	Initial Distillation Baseline	MobileNetV3-Small	75.37 m	—	—	—	Baseline
V2	Sextant	Loss Weighting Diagnostic	MobileNetV3-Small	—	—	—	—	Diagnostic
V3	Anchor	Recipe Rewrite (Metric Teacher)	MobileNetV3-Small	1.160 m	39.3 %	—	—	Working
V4	Pivot	EfficientViT-B1 Encoder	EfficientViT-B1	0.774 m	51.0 %	1.373 m	—	Working
V5	Atlas	Augmentation Pipeline	EfficientViT-B1	0.572 m	63.7 %	2.186 m	—	Production (general)
V6	Cornerstone	Multi-Domain Pretraining	EfficientViT-B1	0.519 m	48.5 %	2.158 m	—	Production (fine-tune base)
V7	Tunnel	Fine-Tuning from V5	EfficientViT-B1	1.315 m	47.5 %	1.982 m	0.445 m	Superseded
V8	Confluence	Joint-Domain Training	EfficientViT-B1	0.592 m	62.9 %	2.266 m ↑	—	Ablation
V9	Lighthouse	Corridor-Specialized Student	EfficientViT-B1	1.553 m	31.6 %	1.589 m	0.382 m	Production (corridor)

HuggingFace releases

The three production checkpoints are released as separate HuggingFace model repositories under the NishantPushparaju/vortex-depth-* family. The pre-production configurations (V1-V4, V7, V8) are retained in the project’s hpc_outputs/ directory for paper-side reproducibility but are not released individually on HuggingFace.

Production checkpoint	HuggingFace identifier	Use case
V5 — Atlas	`NishantPushparaju/vortex-depth-v5-general`	General-purpose indoor depth estimation across diverse room geometries
V6 — Cornerstone	`NishantPushparaju/vortex-depth-v6-pretrained`	Fine-tuning base for additional domain specialists
V9 — Lighthouse	`NishantPushparaju/vortex-depth-v9-corridor`	Production corridor specialist; closed-loop validated against ground-truth depth in simulation

LILocBench corridor RMSE is measured on the Intel RealSense D455 sensor (the LILocBench reference camera). Femto Bolt corridor RMSE is measured on the Orbbec Femto Bolt deployment camera. The two measurements use distinct intrinsics and noise profiles and are not directly comparable.

Lineage Diagram

The nine configurations form a directed graph reflecting the initialization-checkpoint dependencies. Sequential improvements (V1 → V5) preceded the branching specialization phase (V5 → V6, V7, V8; V6 → V9).

flowchart TB V1["V1: Initial Distillation Baseline
(relative-depth teacher)
NYU RMSE: 75.37 m"] V2["V2: Loss Weighting Diagnostic"] V3["V3: Recipe Rewrite
(metric-scale teacher)
NYU RMSE: 1.160 m"] V4["V4: EfficientViT-B1 Encoder
NYU RMSE: 0.774 m"] V5["V5: Augmentation Pipeline
NYU RMSE: 0.572 m
(production: general indoor)"] V6["V6: Multi-Domain Pretraining
NYU RMSE: 0.519 m
(production: fine-tune base)"] V7["V7: Fine-Tuning from V5
LILocBench: 0.445 m
NYU regression: +130 %"] V8["V8: Joint-Domain Training
(Pareto-dominated by V5 and V9)"] V9["V9: Corridor-Specialized Student
LILocBench: 0.382 m
9 / 10 Gazebo success
(production: corridor)"] V1 --> V2 V2 --> V3 V3 -->|"backbone substitution"| V4 V4 -->|"augmentation pipeline added"| V5 V5 -->|"recipe inherited
+ multi-domain pretrain"| V6 V5 -->|"checkpoint init
+ corridor fine-tune"| V7 V5 -->|"checkpoint init
+ joint training"| V8 V6 -->|"checkpoint init
+ corridor fine-tune"| V9 style V1 fill:#fde2e2 style V2 fill:#fff3cd style V3 fill:#e8f0ff style V4 fill:#e8f0ff style V5 fill:#d4e7c5 style V6 fill:#d4e7c5 style V7 fill:#fff3cd style V8 fill:#fde2e2 style V9 fill:#d4e7c5

Outcomes by Category

Category	Configurations
Production deployment	V5 (general indoor), V6 (fine-tune base), V9 (corridor specialist)
Sequential improvement	V3, V4, V5
Specialization branch	V7, V9
Diagnostic / ablation (negative result)	V2, V8
Foundational baseline (superseded)	V1

Findings by Configuration

The training program supports the following findings, each grounded in the experimental record of the corresponding configuration:

#	Finding
V1	Teacher output space must be expressed in the units used for evaluation. Unit-space inconsistency dominates any signal that loss formulation or model capacity could provide.
V2	Multi-task loss weighting cannot compensate for misaligned supervision. Bounded weighting (`log σ²` clamped to `[-2, 2]`) is necessary to prevent task-collapse but is not sufficient for accurate prediction.
V3	The combination of metric-scale teacher, berHu loss, Kendall multi-task weighting, and two-rate optimizer constitutes a sufficient training recipe for metric depth estimation. The recipe was set at V3 and held constant through V9.
V4	Encoder capacity remains a binding constraint at fixed training recipe. EfficientViT-B1 over MobileNetV3-Small produces a 33 % NYU RMSE reduction with no other change.
V5	Overfitting precedes capacity. Before evaluating larger architectures, augmentation should be exercised against the existing baseline. NYU validation accuracy and corridor deployment accuracy are not monotonically correlated.
V6	Multi-domain pretraining transfers to downstream specialization quality. Mixed-dataset training requires explicit handling of supervision gaps to avoid loss-function edge cases.
V7	Single-domain fine-tuning produces corridor-class specialists at the cost of general-domain capability. Specialization quality depends on the initialization checkpoint, not solely on the fine-tuning protocol.
V8	Joint-domain training does not substitute for sequential pretrain-then-specialize when the source and target domains differ at the geometric level under the project’s capacity-distribution coverage ratio.
V9	A 5.31 × 10⁶ parameter student produced via the V6 → corridor fine-tuning pipeline achieves closed-loop navigation parity with ground-truth depth in simulation (9 / 10 success at 10 seeds, 0 collisions).

Selection Guidance

Use case	Recommended checkpoint
General-purpose indoor depth estimation	V5
Production corridor specialist	V9
Fine-tuning base for additional domain specialists	V6
Foundation-model zero-shot inference at maximum throughput on Jetson	DA3-Small (218 FPS / 4.6 ms / 2.7 GB on Jetson Orin Nano TensorRT FP16) — not part of this lineage

The two production deployment models are V5 (general indoor) and V9 (corridor specialist). V9 is the recommended checkpoint for the deployment environment documented throughout this technical report; V5 is the recommended checkpoint for unconstrained indoor scenes.