Datasets

Every evaluation in this work runs on one of four frame sets. None of them are large by computer vision standards — the corridor evaluation has 459 frames, and the glass corridor has 121. The data scale is constrained by the project’s deployment scope: the robot collects data in a single building, and the depth sensor’s failure mode is specific to that building’s materials.

Corridor evaluation data (459 frames)

Location: ml_pipeline/corridor_eval_data/

Source: Orbbec Femto Bolt RGB-D + RPLiDAR S2, recorded on the Traxxas Maxx 4S platform in a university corridor. The original rosbag (rgbd_imu_20260228_003828_0.mcap, ~8.1 GB) was recorded at 30 FPS; frames were subsampled to ~3 FPS for evaluation.

Contents:

Subdirectory	Count	Format	Notes
`rgb/`	485	`.png`, 1280×720	Raw RGB frames
`depth/`	485	`.npy`, 720×1280, float32	ToF depth in meters, 0 = invalid
`da3_depth/`	459	`.npy`, 720×1280, float32	DA3-Small predictions (zero-shot)

The 26-frame gap between RGB (485) and DA3 (459) comes from frames where DA3 inference was not run. All evaluations use the 459-frame DA3-aligned subset.

Naming convention: 00000.png through 00484.png (5-digit zero-padded).

Student predictions: Not pre-computed. V5, V6, V7, and V9 predictions are generated via live inference in the evaluation and video generation scripts.

Glass corridor (121 frames)

Location: ~/maps/glass_corridor_frames/

Source: Same platform, different corridor section with a full glass wall on one side. Extracted from rosbag rgbd_imu_20260302_173610.

Contents:

Subdirectory	Count	Format	Notes
`rgb/`	121	`.png`, 1280×720	Raw RGB frames
`depth/`	121	`.npy`, varies	ToF depth (some at 576×640 native)
`da3_depth/`	121	`.npy`	DA3-Small predictions

Student predictions: Pre-computed V5, V6, V7 .npy files live in a separate directory: ~/maps/glass_corridor_student_results/{depth_v5_vivek,depth_v6,depth_v7}/. V9 is always computed via live inference.

Naming convention: frame_0000_t30.0s.png (timestamped, starting at t=30s to skip the initial static period).

Why this dataset matters: The glass wall creates a different failure mode than the polished floor. The ToF sensor does not just lose pixels — it sometimes returns erroneous depth from specular reflections off the glass. This tests whether the fusion pipeline degrades gracefully or introduces false obstacles.

bag_213831 (150 frames)

Location: ~/maps/bag_213831_frames/

Source: Jetson-mounted recording, different corridor section, different lighting. Extracted from rosbag rgbd_imu_20260302_213831.

Contents:

Subdirectory	Count	Format	Notes
`rgb/`	150	`.png`, 1280×720	Raw RGB frames
`depth/`	150	`.npy`	ToF depth
`da3_depth/`	150	`.npy`	DA3-Small predictions

Student predictions: Pre-computed V5, V6, V7 and DA3 in ~/maps/bag_213831_student_results/. V9 via live inference.

Naming convention: frame_0000_t0.0s.png (timestamped, 2-second spacing).

NYU Depth V2

Source: HuggingFace datasets (sayakpaul/nyu_depth_v2), loaded via dataset/nyu_loader.py.

Size: 47,584 training + 654 test RGB-D pairs (640×480).

Usage: Primary training set for V1–V6. Evaluation set for NYU RMSE numbers.

Gotcha: The HuggingFace loader uses a loading script, which datasets >= 4.0 refuses to execute by default. requirements.txt pins datasets < 4.0.

LILocBench

Source: Bonn Indoor Localization Benchmark (corridor subset). Extracted via extract_lilocbench.py.

Usage: Fine-tuning set for V7 and V9. Evaluation set for corridor RMSE.

SUN RGB-D + DIODE

Usage: V6 pretraining (diverse indoor / outdoor depth). Not used directly in evaluation — only as the pretraining-stage corpus prior to NYU fine-tuning.

Data not in the repository

The following are git-ignored and must be obtained separately for full reproducibility:

Asset	Size	Location	How to get it
Model weights (`best_depth_v{5,6,7,9}.pt`)	~61 MB each	`hpc_outputs/`	HuggingFace dataset repo
Corridor eval data	~1.2 GB	`corridor_eval_data/`	HuggingFace dataset repo
Raw rosbags	~10+ GB each	`~/rosbags/`	Available on request
Glass corridor frames	~200 MB	`~/maps/glass_corridor_frames/`	HuggingFace dataset repo
bag_213831 frames	~250 MB	`~/maps/bag_213831_frames/`	HuggingFace dataset repo
Demo videos	~700 MB	`/media/nishant/SeeGayt2/demo_videos/`	Generated via scripts

The HuggingFace dataset repository provides model weights and evaluation data. Video outputs can be regenerated from the evaluation data using the scripts in this repo.