Hardware

The deployment platform. The student model and runtime fusion exist because of these specific sensors and this specific compute budget — knowing what they are and why they were chosen makes the rest of the docs make more sense.

This page covers the on-board hardware. The off-board training hardware (NYU Greene HPC, L40S GPUs) is documented in training.


The robot — Traxxas Maxx 4S

A 1/10 scale Ackermann-steered RC chassis. The body of the robot.

Property Value
Scale 1/10
Steering Ackermann (front-wheel)
Wheelbase 0.187 m
Front track 0.137 m
Rear track 0.145 m
Wheel radius 0.055 m
Max steering angle 0.489 rad (~28°)
Footprint [[0.20, 0.08], [0.20, -0.08], [-0.10, -0.08], [-0.10, 0.08]]

The chassis isn’t a background detail. Its geometry shapes the navigation and control stack:

The platform was chosen for three reasons: enough space and payload to mount real sensors and compute, fast and sturdy enough to handle the indoor test environment, and a geometry that maps naturally to F1TENTH-style autonomy thinking. Not just “some RC car” — a deliberate physical commitment to treating the project like a small autonomous vehicle.


The depth camera — Orbbec Femto Bolt

The sensor whose failure mode gave the project its identity. RGB-D camera with Time-of-Flight depth and an onboard IMU.

Stream Resolution Rate Format Topic
RGB 1280 × 720 30 Hz Image /camera/color/image_raw
ToF depth 640 × 576 30 Hz Image (16UC1, mm) /camera/depth/image_raw
Confidence 640 × 576 30 Hz Image /camera/depth/confidence
IMU 200 Hz Imu /camera/imu

The ToF channel fails on 77.79% of pixels in our corridor (polished floors, glass walls, glossy furniture). This is the central failure mode the project was built to address — see Bootstrap Perception and ToF Failure Modes for the full discussion.

We kept this sensor despite the failure rate because the alternative isn’t “perfect sensor vs broken sensor.” It’s “no depth camera at all, vs a sensor that’s still highly valuable where it works and informative even where it doesn’t.” The Femto Bolt provides metric depth where valid, a confidence map that tells us which pixels to trust, RGB for the learned fallback, and IMU for the broader stack. That combination is what makes bootstrap perception possible.

The driver is the orbbec_camera ROS 2 package. Connection is USB 3.0. Camera intrinsics are available on /camera/depth/camera_info.


The LiDAR — RPLiDAR S2

2D rotating laser scanner. The reliable backbone of navigation. Always works, but only sees a single horizontal plane.

Property Value
Range 18 m
Field of view 360°
Scan rate ~10 Hz
Scan mode DenseBoost
Driver sllidar_ros2
Connection USB serial @ 1,000,000 baud
Frame lidar_link
Topic /scan (sensor_msgs/LaserScan)

Used by Nav2’s local costmap (primary obstacle source), Nav2’s global costmap (only source — no depth at the global scale), AMCL (localization), and SLAM Toolbox (mapping).

The LiDAR’s limitation isn’t quality — it’s dimensionality. It sees a thin slice of the world at scanner mount height. That’s excellent for wall geometry, planar occupancy, and localization against a 2D map. It’s weak for:

This is exactly why the Femto Bolt depth and learned depth from the student model are needed — they fill the vertical gap the LiDAR can’t see. See Four-Layer Sensing Hierarchy.

The mental model: RPLiDAR S2 is the reliable skeleton of the robot’s geometry understanding. Bootstrap perception doesn’t replace that skeleton. It adds missing body volume where the 2D scan is inherently incomplete.


The compute — Jetson Orin Nano 8GB

The robot’s main computer. The board that runs ROS 2, all perception nodes, navigation, and the hardware bridge.

Property Value
RAM 8 GB (shared between CPU and GPU)
OS Ubuntu 22.04 (JetPack 6)
ROS 2 Humble
GPU NVIDIA, with TensorRT FP16 inference

What runs on it:

Component What it does
Ackermann HW Node Motor / steering control via Teensy serial bridge
Student TRT Node EfficientViT-B1 inference (V9 student), ~5 ms
Depth Fusion Node Confidence-gated fusion of ToF + student depth
YOLO TRT Node YOLOv8 detection, ~6 ms
Class Costmap Node Class-aware obstacle inflation (when wired)
Nav2 Full navigation stack (planner, controller, costmap)
EKF (robot_localization) Odometry fusion

USB allocation:

Runtime perception performance:

The Jetson matters because it’s where the project’s ambitions hit the deployment budget. It’s easy to design a system that works on a desktop GPU. It’s much harder to make the whole stack — perception, navigation, localization, and real sensors — fit on an embedded platform that still has to leave room for ROS 2. That budget is why the student model exists in the first place.

Known issue: TensorRT runtime

TensorRT on the Jetson uses a ctypes CUDA backend instead of pycuda, because pycuda is incompatible with JetPack 6. This is wired up in trt_utils.py and works correctly — but it’s a non-standard setup that confused us early in deployment. If you’re porting the runtime to a different Jetson configuration, this is the pitfall.


What this hardware setup forces

The hardware budget is tight enough that several design choices weren’t really choices:

  1. The student must be small. ~5 MB ONNX, ~5 ms inference. EfficientViT-B1 sits at the upper end of what fits.
  2. Teachers can’t run on the robot. DA3-Metric-Large alone exceeds the 8 GB RAM budget. Distillation on HPC is the only path.
  3. Perception has to fit alongside Nav2. The student plus YOLO plus the fusion node plus class costmap is the full perception budget. Adding a fourth network would push the system into swap.
  4. 2D LiDAR is the navigation backbone. A 3D LiDAR would solve the vertical-extent problem but would cost more than the rest of the robot combined and wouldn’t fit the form factor.
  5. The depth camera matters even when it fails. ToF gives metric anchor where it works and confidence info everywhere — both are load-bearing for the bootstrap pipeline.

If any of these constraints relaxed (more RAM, faster GPU, 3D LiDAR), the system architecture would look different. Bootstrap perception is a response to the specific shape of this hardware budget.