Model Lineage: V1 through V9
The student-model training program comprises nine configurations spanning two backbones, three dataset mixtures, and two loss formulations. Each configuration is documented on a dedicated page describing the experimental variable introduced, the quantitative results, the operational disposition, and the findings supported by the data.
Configuration Summary
| # | Codename | Title | Backbone | NYU val RMSE | NYU val mIoU | Femto Bolt corridor | LILocBench corridor | Status |
|---|---|---|---|---|---|---|---|---|
| V1 | Compass | Initial Distillation Baseline | MobileNetV3-Small | 75.37 m | — | — | — | Baseline |
| V2 | Sextant | Loss Weighting Diagnostic | MobileNetV3-Small | — | — | — | — | Diagnostic |
| V3 | Anchor | Recipe Rewrite (Metric Teacher) | MobileNetV3-Small | 1.160 m | 39.3 % | — | — | Working |
| V4 | Pivot | EfficientViT-B1 Encoder | EfficientViT-B1 | 0.774 m | 51.0 % | 1.373 m | — | Working |
| V5 | Atlas | Augmentation Pipeline | EfficientViT-B1 | 0.572 m | 63.7 % | 2.186 m | — | Production (general) |
| V6 | Cornerstone | Multi-Domain Pretraining | EfficientViT-B1 | 0.519 m | 48.5 % | 2.158 m | — | Production (fine-tune base) |
| V7 | Tunnel | Fine-Tuning from V5 | EfficientViT-B1 | 1.315 m | 47.5 % | 1.982 m | 0.445 m | Superseded |
| V8 | Confluence | Joint-Domain Training | EfficientViT-B1 | 0.592 m | 62.9 % | 2.266 m ↑ | — | Ablation |
| V9 | Lighthouse | Corridor-Specialized Student | EfficientViT-B1 | 1.553 m | 31.6 % | 1.589 m | 0.382 m | Production (corridor) |
HuggingFace releases
The three production checkpoints are released as separate HuggingFace model repositories under the NishantPushparaju/vortex-depth-* family. The pre-production configurations (V1-V4, V7, V8) are retained in the project’s hpc_outputs/ directory for paper-side reproducibility but are not released individually on HuggingFace.
| Production checkpoint | HuggingFace identifier | Use case |
|---|---|---|
| V5 — Atlas | NishantPushparaju/vortex-depth-v5-general |
General-purpose indoor depth estimation across diverse room geometries |
| V6 — Cornerstone | NishantPushparaju/vortex-depth-v6-pretrained |
Fine-tuning base for additional domain specialists |
| V9 — Lighthouse | NishantPushparaju/vortex-depth-v9-corridor |
Production corridor specialist; closed-loop validated against ground-truth depth in simulation |
LILocBench corridor RMSE is measured on the Intel RealSense D455 sensor (the LILocBench reference camera). Femto Bolt corridor RMSE is measured on the Orbbec Femto Bolt deployment camera. The two measurements use distinct intrinsics and noise profiles and are not directly comparable.
Lineage Diagram
The nine configurations form a directed graph reflecting the initialization-checkpoint dependencies. Sequential improvements (V1 → V5) preceded the branching specialization phase (V5 → V6, V7, V8; V6 → V9).
(relative-depth teacher)
NYU RMSE: 75.37 m"] V2["V2: Loss Weighting Diagnostic"] V3["V3: Recipe Rewrite
(metric-scale teacher)
NYU RMSE: 1.160 m"] V4["V4: EfficientViT-B1 Encoder
NYU RMSE: 0.774 m"] V5["V5: Augmentation Pipeline
NYU RMSE: 0.572 m
(production: general indoor)"] V6["V6: Multi-Domain Pretraining
NYU RMSE: 0.519 m
(production: fine-tune base)"] V7["V7: Fine-Tuning from V5
LILocBench: 0.445 m
NYU regression: +130 %"] V8["V8: Joint-Domain Training
(Pareto-dominated by V5 and V9)"] V9["V9: Corridor-Specialized Student
LILocBench: 0.382 m
9 / 10 Gazebo success
(production: corridor)"] V1 --> V2 V2 --> V3 V3 -->|"backbone substitution"| V4 V4 -->|"augmentation pipeline added"| V5 V5 -->|"recipe inherited
+ multi-domain pretrain"| V6 V5 -->|"checkpoint init
+ corridor fine-tune"| V7 V5 -->|"checkpoint init
+ joint training"| V8 V6 -->|"checkpoint init
+ corridor fine-tune"| V9 style V1 fill:#fde2e2 style V2 fill:#fff3cd style V3 fill:#e8f0ff style V4 fill:#e8f0ff style V5 fill:#d4e7c5 style V6 fill:#d4e7c5 style V7 fill:#fff3cd style V8 fill:#fde2e2 style V9 fill:#d4e7c5
Outcomes by Category
| Category | Configurations |
|---|---|
| Production deployment | V5 (general indoor), V6 (fine-tune base), V9 (corridor specialist) |
| Sequential improvement | V3, V4, V5 |
| Specialization branch | V7, V9 |
| Diagnostic / ablation (negative result) | V2, V8 |
| Foundational baseline (superseded) | V1 |
Findings by Configuration
The training program supports the following findings, each grounded in the experimental record of the corresponding configuration:
| # | Finding |
|---|---|
| V1 | Teacher output space must be expressed in the units used for evaluation. Unit-space inconsistency dominates any signal that loss formulation or model capacity could provide. |
| V2 | Multi-task loss weighting cannot compensate for misaligned supervision. Bounded weighting (log σ² clamped to [-2, 2]) is necessary to prevent task-collapse but is not sufficient for accurate prediction. |
| V3 | The combination of metric-scale teacher, berHu loss, Kendall multi-task weighting, and two-rate optimizer constitutes a sufficient training recipe for metric depth estimation. The recipe was set at V3 and held constant through V9. |
| V4 | Encoder capacity remains a binding constraint at fixed training recipe. EfficientViT-B1 over MobileNetV3-Small produces a 33 % NYU RMSE reduction with no other change. |
| V5 | Overfitting precedes capacity. Before evaluating larger architectures, augmentation should be exercised against the existing baseline. NYU validation accuracy and corridor deployment accuracy are not monotonically correlated. |
| V6 | Multi-domain pretraining transfers to downstream specialization quality. Mixed-dataset training requires explicit handling of supervision gaps to avoid loss-function edge cases. |
| V7 | Single-domain fine-tuning produces corridor-class specialists at the cost of general-domain capability. Specialization quality depends on the initialization checkpoint, not solely on the fine-tuning protocol. |
| V8 | Joint-domain training does not substitute for sequential pretrain-then-specialize when the source and target domains differ at the geometric level under the project’s capacity-distribution coverage ratio. |
| V9 | A 5.31 × 10⁶ parameter student produced via the V6 → corridor fine-tuning pipeline achieves closed-loop navigation parity with ground-truth depth in simulation (9 / 10 success at 10 seeds, 0 collisions). |
Selection Guidance
| Use case | Recommended checkpoint |
|---|---|
| General-purpose indoor depth estimation | V5 |
| Production corridor specialist | V9 |
| Fine-tuning base for additional domain specialists | V6 |
| Foundation-model zero-shot inference at maximum throughput on Jetson | DA3-Small (218 FPS / 4.6 ms / 2.7 GB on Jetson Orin Nano TensorRT FP16) — not part of this lineage |
The two production deployment models are V5 (general indoor) and V9 (corridor specialist). V9 is the recommended checkpoint for the deployment environment documented throughout this technical report; V5 is the recommended checkpoint for unconstrained indoor scenes.