Datasets
Every evaluation in this work runs on one of four frame sets. None of them are large by computer vision standards — the corridor evaluation has 459 frames, and the glass corridor has 121. The data scale is constrained by the project’s deployment scope: the robot collects data in a single building, and the depth sensor’s failure mode is specific to that building’s materials.
Corridor evaluation data (459 frames)
Location: ml_pipeline/corridor_eval_data/
Source: Orbbec Femto Bolt RGB-D + RPLiDAR S2, recorded on the Traxxas Maxx 4S platform in a university corridor. The original rosbag (rgbd_imu_20260228_003828_0.mcap, ~8.1 GB) was recorded at 30 FPS; frames were subsampled to ~3 FPS for evaluation.
Contents:
| Subdirectory | Count | Format | Notes |
|---|---|---|---|
rgb/ |
485 | .png, 1280×720 |
Raw RGB frames |
depth/ |
485 | .npy, 720×1280, float32 |
ToF depth in meters, 0 = invalid |
da3_depth/ |
459 | .npy, 720×1280, float32 |
DA3-Small predictions (zero-shot) |
The 26-frame gap between RGB (485) and DA3 (459) comes from frames where DA3 inference was not run. All evaluations use the 459-frame DA3-aligned subset.
Naming convention: 00000.png through 00484.png (5-digit zero-padded).
Student predictions: Not pre-computed. V5, V6, V7, and V9 predictions are generated via live inference in the evaluation and video generation scripts.
Glass corridor (121 frames)
Location: ~/maps/glass_corridor_frames/
Source: Same platform, different corridor section with a full glass wall on one side. Extracted from rosbag rgbd_imu_20260302_173610.
Contents:
| Subdirectory | Count | Format | Notes |
|---|---|---|---|
rgb/ |
121 | .png, 1280×720 |
Raw RGB frames |
depth/ |
121 | .npy, varies |
ToF depth (some at 576×640 native) |
da3_depth/ |
121 | .npy |
DA3-Small predictions |
Student predictions: Pre-computed V5, V6, V7 .npy files live in a separate directory: ~/maps/glass_corridor_student_results/{depth_v5_vivek,depth_v6,depth_v7}/. V9 is always computed via live inference.
Naming convention: frame_0000_t30.0s.png (timestamped, starting at t=30s to skip the initial static period).
Why this dataset matters: The glass wall creates a different failure mode than the polished floor. The ToF sensor does not just lose pixels — it sometimes returns erroneous depth from specular reflections off the glass. This tests whether the fusion pipeline degrades gracefully or introduces false obstacles.
bag_213831 (150 frames)
Location: ~/maps/bag_213831_frames/
Source: Jetson-mounted recording, different corridor section, different lighting. Extracted from rosbag rgbd_imu_20260302_213831.
Contents:
| Subdirectory | Count | Format | Notes |
|---|---|---|---|
rgb/ |
150 | .png, 1280×720 |
Raw RGB frames |
depth/ |
150 | .npy |
ToF depth |
da3_depth/ |
150 | .npy |
DA3-Small predictions |
Student predictions: Pre-computed V5, V6, V7 and DA3 in ~/maps/bag_213831_student_results/. V9 via live inference.
Naming convention: frame_0000_t0.0s.png (timestamped, 2-second spacing).
NYU Depth V2
Source: HuggingFace datasets (sayakpaul/nyu_depth_v2), loaded via dataset/nyu_loader.py.
Size: 47,584 training + 654 test RGB-D pairs (640×480).
Usage: Primary training set for V1–V6. Evaluation set for NYU RMSE numbers.
Gotcha: The HuggingFace loader uses a loading script, which datasets >= 4.0 refuses to execute by default. requirements.txt pins datasets < 4.0.
LILocBench
Source: Bonn Indoor Localization Benchmark (corridor subset). Extracted via extract_lilocbench.py.
Usage: Fine-tuning set for V7 and V9. Evaluation set for corridor RMSE.
SUN RGB-D + DIODE
Usage: V6 pretraining (diverse indoor / outdoor depth). Not used directly in evaluation — only as the pretraining-stage corpus prior to NYU fine-tuning.
Data not in the repository
The following are git-ignored and must be obtained separately for full reproducibility:
| Asset | Size | Location | How to get it |
|---|---|---|---|
Model weights (best_depth_v{5,6,7,9}.pt) |
~61 MB each | hpc_outputs/ |
HuggingFace dataset repo |
| Corridor eval data | ~1.2 GB | corridor_eval_data/ |
HuggingFace dataset repo |
| Raw rosbags | ~10+ GB each | ~/rosbags/ |
Available on request |
| Glass corridor frames | ~200 MB | ~/maps/glass_corridor_frames/ |
HuggingFace dataset repo |
| bag_213831 frames | ~250 MB | ~/maps/bag_213831_frames/ |
HuggingFace dataset repo |
| Demo videos | ~700 MB | /media/nishant/SeeGayt2/demo_videos/ |
Generated via scripts |
The HuggingFace dataset repository provides model weights and evaluation data. Video outputs can be regenerated from the evaluation data using the scripts in this repo.