M5: Kinematic Safety Supervisor
Kinematic Safety Supervisor
Part of the Terra Perceive series.
After M4, the pipeline can tell you the fused risk of every terrain cell: geometry from the traversability grid, semantics from SegFormer. What it cannot tell you is whether the robot should slow down, brake hard, or emergency-stop — and how urgently. A risk map is a spatial answer. A safety supervisor needs a temporal answer: given the robot’s current speed and terrain, how much time do we have before something goes wrong?
This milestone replaces the common “if distance < 3m, stop” heuristic with a physics-based model. The stopping distance depends on velocity and terrain friction. The intervention depends on time-to-collision, not raw distance. The result is a system that knows a bulldozer on mud needs 8 metres to stop at 2 m/s, not 3.
Stopping Distance from First Principles
The core equation comes from the work-energy theorem. A vehicle at velocity \(v\) has kinetic energy \(\frac{1}{2}mv^2\). The only force decelerating it is friction: \(F = \mu m g\), where \(\mu\) is the friction coefficient and \(g = 9.81\;\text{m/s}^2\). Friction does work over distance \(d\):
\[W_{\text{friction}} = -\mu m g \cdot d\]Setting work equal to kinetic energy change (final KE = 0):
\[-\mu m g \cdot d = -\frac{1}{2}mv^2\]Mass cancels — a bicycle and a bulldozer at the same speed and friction have the same stopping distance (though very different stopping forces). Solving for \(d\):
\[d_{\text{brake}} = \frac{v^2}{2 \mu g}\]But braking doesn’t start instantly. There is a sensor-to-actuator latency \(t_{\text{react}}\) (we use 0.2 s) during which the vehicle continues at full speed:
\[d_{\text{react}} = v \cdot t_{\text{react}}\]Total stopping distance:
\[\boxed{d_{\text{stop}} = \frac{v^2}{2 \mu g} + v \cdot t_{\text{react}}}\]The braking term is quadratic in \(v\) — doubling speed quadruples the braking distance. The reaction term is linear. At highway speeds the braking term dominates; at the low speeds of construction equipment (2–5 m/s), the reaction term is a significant fraction of the total.
Hand-calculated checkpoints:
| \(v\) (m/s) | \(\mu\) | \(d_{\text{brake}}\) (m) | \(d_{\text{react}}\) (m) | \(d_{\text{stop}}\) (m) |
|---|---|---|---|---|
| 2.0 | 0.6 | 0.34 | 0.40 | 0.74 |
| 5.0 | 0.6 | 2.12 | 1.00 | 3.12 |
| 2.0 | 0.3 | 0.68 | 0.40 | 1.08 |
The third row shows why terrain matters: halving \(\mu\) (from good gravel to wet mud) nearly doubles the stopping distance at the same speed.
Terrain-Aware Friction
The friction coefficient \(\mu\) is not a constant — it depends on the terrain the robot is currently driving over. M3’s traversability grid already computes a per-cell score in \([0, 1]\) from slope, roughness, and step height. We close the loop by mapping that score to friction:
\[\mu = \mu_{\text{base}} + \mu_{\text{scale}} \cdot s_{\text{trav}}\]With \(\mu_{\text{base}} = 0.3\) and \(\mu_{\text{scale}} = 0.5\):
| Terrain | \(s_{\text{trav}}\) | \(\mu\) | Physical meaning |
|---|---|---|---|
| Wet mud / steep slope | 0.0 | 0.3 | Worst case — long stopping distance |
| Mixed terrain | 0.5 | 0.55 | Moderate grip |
| Dry gravel / flat | 1.0 | 0.8 | Best case — short stopping distance |
This means the safety supervisor automatically becomes more conservative on dangerous terrain. A cell with high slope and high roughness produces a low traversability score, which produces a low \(\mu\), which produces a longer \(d_{\text{stop}}\), which triggers interventions earlier. The chain is fully automatic — no terrain-specific tuning required.
Time-to-Collision
Raw distance to an obstacle is a poor trigger for safety interventions. A worker 5 metres away is safe if the robot is crawling at 0.5 m/s, but dangerous if it’s moving at 5 m/s. What matters is time: how many seconds until the gap closes to within the stopping distance?
\[\text{TTC} = \frac{d_{\text{worker}} - d_{\text{stop}}}{v_{\text{relative}}}\]where \(v_{\text{relative}} = v_{\text{vehicle}} - v_{\text{worker}}\) is the closing speed. The sign convention: a worker moving away from the robot (positive \(v_{\text{worker}}\)) reduces \(v_{\text{relative}}\), increasing TTC. A worker moving toward the robot (negative \(v_{\text{worker}}\)) increases closing speed.
Edge cases:
| Condition | TTC | Meaning |
|---|---|---|
| \(v_{\text{relative}} \leq 0\) | \(+\infty\) | Objects diverging — gap is growing, safe |
| \(d_{\text{worker}} > d_{\text{stop}}\), \(v_{\text{relative}} > 0\) | Positive, finite | Closing — seconds until collision |
| \(d_{\text{worker}} < d_{\text{stop}}\) | Negative | Already inside stopping distance — cannot stop in time |
The negative TTC case is the most critical: it means braking right now at maximum friction will not prevent a collision. This is physically distinct from “TTC is small” — it requires an immediate emergency stop, not a gradual slowdown.
Priority-Ordered Interventions
Safety systems use a strict priority hierarchy, not a flat if-else (ISO 26262). Higher-severity interventions always override lower ones:
| Priority | Condition | Action | Scale factor |
|---|---|---|---|
| P0 | Sensor timeout > 200 ms | Emergency stop | 0.0 |
| P0 | TTC \(\leq\) 0 | Emergency stop | 0.0 |
| P1 | TTC < 2.0 s | Hard brake | 0.1 |
| P2 | TTC < 5.0 s | Proportional scale | \(\frac{\text{TTC} - 2.0}{5.0 - 2.0}\) |
| — | TTC \(\geq\) 5.0 s | No intervention | 1.0 |
The sensor timeout check comes first — before any TTC computation. If the LiDAR hasn’t produced data in 200 ms, all distance measurements are stale. The TTC calculation would be based on outdated information, so the correct action is an immediate stop regardless of what the numbers say.
The proportional scaling band (P2) provides smooth deceleration between 2 s and 5 s. At TTC = 3.5 s, the scale factor is \((3.5 - 2.0) / (5.0 - 2.0) = 0.5\), halving the commanded velocity. At TTC = 2.1 s, it drops to ~3% — nearly a full stop. This avoids the discontinuity of jumping from “full speed” to “hard brake” at a single threshold.
Event Logging
Every call to the safety supervisor produces a SafetyEvent record:
timestamp, rule, d_worker, d_stop, TTC, friction_mu, vel_before, vel_after
Example output:
0.00, TTC >= 5s, 50.00, 0.74, 24.63, 0.60, 2.00, 2.00
0.10, TTC < 5s, 6.26, 0.74, 2.76, 0.60, 2.00, 0.51
0.20, TTC < 2s, 1.50, 0.74, 0.38, 0.60, 2.00, 0.20
0.30, d_worker < d_stop, 0.50, 0.74, -0.12, 0.60, 2.00, 0.00
The CSV log serves two purposes: (1) debugging — you can trace exactly which rule fired and why at every timestep; (2) safety audit — in a production system, this log feeds into a NATS JetStream topic for post-incident replay (planned for Phase 2).
Latency Self-Monitoring
The safety loop must run faster than the control loop it feeds. We target < 5 ms per iteration on mock data (at 50 Hz control rate, the safety supervisor gets 20 ms per cycle — 5 ms leaves ample margin for the rest of the pipeline).
Each evaluate() call is timed with std::chrono::high_resolution_clock. The durations are stored in a vector; at the end of a session, sorting the vector and indexing at the 50th and 95th percentiles gives p50 and p95 latency:
- p50 = median latency (typical case)
- p95 = 95th percentile (worst 5% of calls)
If p50 is 0.1 ms but p95 is 8 ms, the system has occasional spikes — possibly from memory allocation in the event log’s push_back. A production version would pre-allocate the vector or use a ring buffer.
Test Results
10 new unit tests covering all sub-goals, all passing:
| Test | What it verifies |
|---|---|
StoppingDistanceKnownValues |
d_stop matches hand-calculated values within 1% |
TTCApproachingWorker |
Stationary worker at 10 m, TTC = 4.63 s |
TTCRecedingWorker |
Worker moving away, TTC = infinity |
TTCAlreadyTooClose |
Worker at 0.5 m, TTC < 0 |
InterventionEmergencyStop |
d_worker < d_stop triggers E-stop, scale = 0.0 |
InterventionHardBrake |
TTC < 2 s triggers hard brake, scale = 0.1 |
InterventionProportionalScale |
TTC = 3.5 s, scale = 0.5 |
InterventionNone |
Worker at 50 m, no intervention, scale = 1.0 |
FrictionFromTraversability |
trav=0 maps to mu=0.3, trav=1 maps to mu=0.8 |
LowFrictionIncreasesStoppingDistance |
mu=0.3 produces longer d_stop than mu=0.8 |
Total test count: 52/52 (42 prior + 10 safety).
What I’d Do Differently
- Forward-arc geometric filtering. The current implementation takes
d_to_nearest_workeras a scalar input — the caller is responsible for filtering obstacles to the forward wedge. A more complete version would take a list of obstacle positions and internally filter to a configurable ±30 arc, 15 m range. This would make the supervisor self-contained rather than dependent on the caller’s geometry. - Multiple obstacle handling. The current API evaluates one obstacle at a time. A production version would iterate over all obstacles in the forward arc and use the minimum TTC to drive the intervention — the most dangerous obstacle wins.
- Pre-allocated event log. Using
std::vector::push_backin a safety-critical loop is technically unbounded in latency due to reallocation. A ring buffer with a fixed capacity would guarantee constant-time insertion.
Connection to Future Steps
M5 closes the perception-to-safety loop that is the core of Phase 1:
LiDAR scan
--> RANSAC ground segmentation (M2)
--> Traversability grid: risk + confidence (M3)
--> Camera-LiDAR fusion: semantic override (M4)
--> Terrain friction: mu = f(traversability) (M5.4)
--> Stopping distance: d_stop = f(v, mu) (M5.1)
--> TTC + intervention (M5.2, M5.3)
--> Safety event log (M5.5)
Every component feeds the next. The traversability score from M3 determines how much friction the safety supervisor assumes. A muddy slope (high risk, low traversability) produces a low \(\mu\), which produces a long stopping distance, which triggers earlier interventions — exactly the conservative behaviour you want on dangerous terrain.
M6 wires this pipeline into a Docker container with a smoke test: docker-compose up processes a bundled RELLIS-3D sample and produces a BEV image, a safety CSV, and a timing report.
Code: src/safety_supervisor.cpp, include/safety_supervisor.hpp, tests/cpp/test_safety.cpp
References
- OpenStax, “Kinetic Energy and the Work-Energy Theorem”, College Physics 1e, Ch. 7.2 — derivation of stopping distance from the work-energy theorem.
- Criticality Metrics Documentation, “Time To Collision (TTC)” — formal TTC definition, handling of diverging objects, comparison with other safety metrics.
- Nav2 Documentation, “Collision Monitor” — production ROS2 pattern for polygon-based detection zones and Stop/Slowdown/Approach models.
- ISO 26262, “Road vehicles — Functional safety” — ASIL classification framework and intervention priority hierarchy for safety-critical automotive systems.
- Matias Wermelinger et al., “Navigation Planning for Legged Robots in Challenging Terrain”, IROS 2016 — vehicle-aware traversability scoring with multiplicative penalty model.
- Shrey Aeron et al., “RoadRunner — Learning Traversability Estimation for Autonomous Off-road Driving”, arXiv 2402.19341, 2024 — traversability score to friction coefficient mapping for off-road navigation.