ToF failure modes
Why the Orbbec Femto Bolt loses most of its depth pixels in our corridor.
The numbers
From results/01_paper_results/pixel_fusion.json, averaged over 459 corridor frames:
| Statistic | Value |
|---|---|
| Mean dead-pixel rate | 77.79% |
| Mean valid-pixel rate | 22.21% |
| Worst-case dead rate (glass corridor) | ~80%+ |
| Confidence threshold for “valid” | 0.5 |
We re-verified this end-to-end inside the Docker container on the same 459 frames:
Total pixels: 35,251,200
Valid pixels (sensor 0.1-5.0m): 7,172,823 (20.3%)
Sensor zero (dead): 27,942,585 (79.3%)
Sensor far (>5.0m): 135,792 (0.4%)
Slightly different filtering produces slightly different numbers, but both methods land in the same range — the formal evaluation reports 77.8 %, the Docker run measures 79.7 %, and both correspond to the same operational claim: the majority of the frame is unusable.
What surfaces fail and why
Time-of-Flight depth works by emitting a near-infrared (NIR) pulse and measuring the round-trip time. Failure happens when the returned IR signal is too weak, too late, or too distorted for the sensor to extract a phase.
| Surface | Mechanism | Failure rate |
|---|---|---|
| Polished floors (linoleum, marble, sealed concrete) | Specular reflection — the IR pulse bounces away from the sensor at the angle of incidence rather than diffusing back | Very high — close to 100% on smooth surfaces |
| Glass walls and doors | IR passes through the glass entirely, or reflects off-axis at the wrong angle | Very high — the hardware can’t see glass |
| Glossy furniture (lacquered desks, polished metal) | Same specular mechanism as polished floors | High — depends on viewing angle |
| Distant surfaces (>5 m) | NIR signal attenuates with distance squared; not enough returned energy to phase-lock | High beyond ~5 m |
| Wet or transparent surfaces | Combination of absorption and reflection | High |
| Matte walls, fabric, painted surfaces | Diffuse reflection — works as designed | Low — these are the surviving 22% |
The pattern is what the literature calls “specular vs diffuse” — the sensor depends on diffuse reflection, and our deployment environment is dominated by surfaces that don’t diffuse well.
Why this is structured failure, not random failure
This matters for the rest of the bootstrap-perception strategy.
If ToF failures were spatially random — a coin flip per pixel — the surviving 22% would be uniformly distributed and easy to use as a sample. They’re not. They cluster on the surfaces that actually diffuse IR: matte walls, fabric, the rough patches between polished sections of floor. The dead regions cluster on the surfaces we most need depth for: the glass we’d hit, the polished floor we’d misjudge clearance over.
Two implications:
- Median-scale calibration still works despite the structured failure, because the surviving pixels are still distributed across the depth range — they’re just not uniformly distributed across the image.
- The dead regions are the operationally important regions. If the sensor failed on random pixels in the matte wall, we wouldn’t care. It fails on glass and floor — exactly the surfaces Nav2 needs to know about.
How the sensor signals what it’s failing on
The Femto Bolt publishes a per-pixel confidence map alongside the depth image (/camera/depth/confidence). High confidence means the IR phase lock was strong; low confidence means the sensor itself doesn’t trust the reading.
This confidence signal is what makes Confidence-Gated Fusion work — the system doesn’t have to infer which depth readings are bad, the sensor tells it. Threshold of 0.5 (the value used in the runtime fusion node) keeps about 22% of pixels per frame and rejects the rest.
Without that confidence signal, fusion would have to fall back to heuristics (depth = 0 means dead? depth jumps frame-to-frame mean dead?) and the failure detection would be noisier.
Why this motivates the whole project
Three options were on the table when this failure mode became clear:
- Replace the sensor. Stereo, LiDAR-3D, structured-light alternatives. Either too expensive, too power-hungry, or not commercially available at the form factor we needed.
- Live with the blind spots. Accept that Nav2 would miss obstacles in 78% of pixels. Untenable for deployment.
- Bootstrap. Use the surviving 22% to anchor a learned monocular depth model, fuse the two outputs, and present a dense fused depth to Nav2.
The third option is the approach taken in this work. It is a pragmatic response to a hardware constraint that cannot be engineered around at the deployment scale. See Bootstrap Perception for the full operational specification.