Demo videos
Full-length comparison videos showing every depth source side-by-side across all three evaluation datasets. Generated at native 1280×720 resolution, 10 FPS, with inferno colormap and shared per-frame percentile-based depth range.
Per-version six-panel videos
The recommended way to compare model variants on the corridor evaluation set is the per-version six-panel video sequence, available on each production model’s page:
- V5 — Atlas (general-purpose model)
- V6 — Cornerstone (fine-tuning base)
- V7 — Tunnel (corridor specialist initialized from V5)
- V9 — Lighthouse (production corridor specialist)
Each video shows: RGB input · raw ToF · DA3-Small reference depth · the model’s raw inference · ToF+DA3 fusion · ToF+model fusion. The full-length source clips (per-channel .avi files) live on the external SSD at /media/nishant/SeeGayt2/demo_videos/corridor_eval/. The embedded MP4s on this site are H.264-encoded composites of those sources for browser playback.
Glass corridor
121 frames through a glass-walled section. The DA3 and V5/V6 panels mis-predict the geometry behind the glass (predicting the corridor “ends” at the glass surface). V7 and V9, which were fine-tuned on LILocBench corridor data, see through the glass and predict the actual room depth behind it.
Bag 213831
150-frame indoor recording outside the LILocBench corridor distribution. V5 and V6 produce smooth depth, V9 produces noticeably more noise (the corridor specialist generalizing poorly to a non-corridor scene). This is the V9 generalization caveat made visible.
Individual model videos
Each dataset produces 12 videos: RGB, sensor depth, DA3-Small, four student models (V5, V6, V7, V9), and five fused variants (sensor + each model).
Generation script: generate_demo_videos.py
| Dataset | Frames | Videos | Total size |
|---|---|---|---|
corridor_eval/ |
459 | 12 .avi (XVID) |
~588 MB |
glass_corridor/ |
121 | 12 .avi (XVID) |
~46 MB |
bag_213831/ |
150 | 12 .avi (XVID) |
~73 MB |
Video types per dataset:
rgb.avi— Raw RGB inputsensor_depth.avi— ToF sensor depth (dead pixels shown in grey)da3.avi— DA3-Small median-scaled depthv5.avi,v6.avi,v7.avi,v9.avi— Student model depthfused_sensor_da3.avi— Pixel-level fusion: sensor where valid, DA3 where deadfused_sensor_v5.avithroughfused_sensor_v9.avi— Same fusion, one per student model
Fusion protocol per frame:
- Load ToF sensor depth and create valid mask (depth > 0).
- Run model inference (or load pre-computed
.npy). - Apply median-scale alignment to model prediction.
- Fused output = sensor depth where valid, aligned model prediction where invalid.
- Colorize both fused and individual using shared per-frame percentile range (2nd–98th percentile across all depth sources for that frame).
Grid comparison videos
Synchronized multi-panel videos showing all sources in a single frame. Three layout variants:
Models grid (2×3, 1920×720)
| RGB | DA3-Small | V5 |
|---|---|---|
| V6 | V7 | V9 |
What it shows: How each model interprets the same frame. Useful for spotting where students agree/disagree with DA3 and where hallucinations differ between model versions.
Fused grid (2×3, 1920×720)
| Sensor Depth | Fused DA3 | Fused V5 |
|---|---|---|
| Fused V6 | Fused V7 | Fused V9 |
What it shows: How fusion quality varies by model. The sensor depth panel makes it obvious how many pixels are dead (grey) and how each model fills the gaps.
Combined grid (2×4, 1920×720)
| RGB | Sensor | Fused DA3 | Fused V5 |
|---|---|---|---|
| Fused V6 | Fused V7 | Fused V9 | Stats |
What it shows: The full picture in one video. The stats panel displays frame index, dead pixel percentage, depth range, and dataset name — context you need when scrubbing through 459 frames at variable speeds.
Generation script: generate_grid_video.py
Generated grid videos:
| Dataset | Variant | File |
|---|---|---|
| corridor_eval | models | corridor_eval_grid.avi |
| corridor_eval | fused | corridor_eval_grid_fused.avi |
| corridor_eval | combined | corridor_eval_grid_combined.avi |
| glass_corridor | combined | glass_corridor_grid_combined.avi |
| bag_213831 | combined | bag_213831_grid_combined.avi |
Memory management
Video generation on a 16 GB laptop is constrained. Three strategies keep memory stable:
-
Stream-to-disk:
generate_demo_videos.pywrites each frame to disk viacv2.VideoWriterimmediately after compositing. No frame buffering. -
Sequential model processing:
generate_corridor_missing.pyhandles the case where corridor_eval needs live inference for V5/V6/V7 (no pre-computed.npy). It loads one model, processes all 459 frames, writes to disk, then explicitly unloads (del model; torch.cuda.empty_cache(); gc.collect()) before loading the next. -
All-models-loaded grid inference:
generate_grid_video.pyloads all four student models at startup (combined GPU footprint ~1 GB) and processes frames one at a time. Per-frame inference for all models, compositing, writing, then discarding all per-frame data.
Generating
# All models, all datasets at native 1280×720
python generate_demo_videos.py \
--dataset all --models all \
--output-dir /path/to/demo_videos
# Grid comparison videos
python generate_grid_video.py \
--dataset all --variant combined \
--output-dir /path/to/demo_videos
# Missing corridor V5/V6/V7 (if not pre-computed)
python generate_corridor_missing.py
Or via Docker:
docker compose run demo-videos
docker compose run grid-videos
Output videos land in the mounted output/demo_videos/ directory.