# sensor fusion | the return ## **Design of Autonomous Systems** ### csci 6907/4907-section 86 ### Prof. **Sibin Mohan** --- consider a **LiDAR** and a **camera** looking at a pedestrian...
--- consider a **LiDAR** and a **camera** looking at a pedestrian...
consider the following situations... ---
consider the following situations... ||| |:-----|:------| | **situation** | **result** | | only **one** detects the pedestrian | use the other to increase chances | ---
consider the following situations... ||| |:-----|:------| | **situation** | **result** | | only **one** detects the pedestrian | use the other to increase chances | | **both** detect the pedestrian | better **accuracy + confidence** | || --- **sensor fusion** = **data fusion** --- ### sensor fusion | **classification** --- ### sensor fusion | **classification** three ways to classify sensor fusion, --- ### sensor fusion | **classification** three ways to classify sensor fusion, ||| |:-----|:------| | **abstraction level** | "_when_ should we fuse?" | --- ### sensor fusion | **classification** three ways to classify sensor fusion, ||| |:-----|:------| | **abstraction level** | "_when_ should we fuse?" | | **centralization level** | "_where_ is the fusion happening?" | --- ### sensor fusion | **classification** three ways to classify sensor fusion, ||| |:-----|:------| | **abstraction level** | "_when_ should we fuse?" | | **centralization level** | "_where_ is the fusion happening?" | | **competition level** | "_what_ should the fusion do?" | || --- ## abstraction level --- ## abstraction level _"**when** should we do the fusion?"_ --- ## abstraction level _"**when** should we do the fusion?"_
**three** types of abstraction level fusion (low, medium, high) --- ### abstraction level | **low-level fusion** --- ### abstraction level | **low-level fusion**
--- ### abstraction level | **low-level fusion**
- fusing **raw data** from multiple sensors --- ### abstraction level | **low-level fusion**
- fusing **raw data** from multiple sensors - _e.g._ point clouds from **LiDARs** and pixels from **cameras** Note: - Object detection is used in the process, but what's really doing the job is projecting the 3D point clouds into the image, and then associating this with the pixels. --- ### abstraction level | **low-level fusion**
- fusing **raw data** from multiple sensors - _e.g._ point clouds from **LiDARs** and pixels from **cameras** good for **object detection** --- ### abstraction level | **low-level fusion** |process| challenges | |:----|:---| |
| projecting 3D point clouds onto image | || --- ### abstraction level | **low-level fusion** |process| challenges | |:----|:---| |
| projecting 3D point clouds onto image
associating points with pixels | || --- ### abstraction level | **low-level fusion** |process| challenges | |:----|:---| |
| projecting 3D point clouds onto image
associating points with pixels | || |pros|cons| |:-----|:------| | future proof | huge processing requirements | || --- ### abstraction level | **mid-level fusion** --- ### abstraction level | **mid-level fusion**
Note: - Information Loss — if tracking in one is incorrect, then everything is messed up --- ### abstraction level | **mid-level fusion**
- fusing objects that are **detected independently** --- ### abstraction level | **mid-level fusion**
- fusing objects that are **detected independently** - each sensor does its own detection --- ### abstraction level | **mid-level fusion**
- fusing objects that are **detected independently** - each sensor does its own detection - _e.g._ camera and radar detect objects --- ### abstraction level | **mid-level fusion**
- fusing objects that are **detected independently** - each sensor does its own detection - _e.g._ camera and radar detect objects - fused using **Kalman Filter** --- ### abstraction level | **mid-level fusion**
|process | | |:-------|:------| |3D bounding box (LiDAR) + 2D bounding box (camera) | | --- ### abstraction level | **mid-level fusion**
|process | | |:-------|:------| |3D bounding box (LiDAR) + 2D bounding box (camera) | | |projecting 3D result into 2D|| --- ### abstraction level | **mid-level fusion**
|process | | |:-------|:------| |3D bounding box (LiDAR) + 2D bounding box (camera) | | |projecting 3D result into 2D|| |data fusion happens in **2D**|| --- ### abstraction level | **mid-level fusion**
|process | challenges | |:-------|:------| |3D bounding box (LiDAR) + 2D bounding box (camera) | if tracking in one is incorrect, then everything is messed up | |projecting 3D result into 2D|| |data fusion happens in **2D**|| || --- ### abstraction level | **mid-level fusion**
|pros|cons| |:-----|:------| | simplicity | potential to **lose information** | || --- ### abstraction level | **high-level fusion** --- ### abstraction level | **high-level fusion**
Note: - Information Loss — if tracking in one is incorrect, then everything is messed up --- ### abstraction level | **high-level fusion**
- fuse objects and their **trajectories** --- ### abstraction level | **high-level fusion**
- fuse objects and their **trajectories** - relying on detections --- ### abstraction level | **high-level fusion**
- fuse objects and their **trajectories** - relying on detections **and** **predictions + tracking** --- ### abstraction level | **high-level fusion**
||| |:-----|:------| | **pros** | **cons** | | further simplicity | **too much** information loss | || --- --- ## centralization level --- ## centralization level _"**where** is the fusion happening?"_ --- ### centralization level | **three types** --- ### centralization level | **three types**
---
||| |:-----|:------| | **centralized** | one central unit deals with it \[low-level\] | ---
||| |:-----|:------| | **centralized** | one central unit deals with it \[low-level\] | | **decentralized** | each sensor fuses data and forwards to next one | ---
||| |:-----|:------| | **centralized** | one central unit deals with it \[low-level\] | | **decentralized** | each sensor fuses data and forwards to next one | | **distributed** | each sensor processes data locally
and sends to next unit \[late\] | || --- ### centralization level | **satellite architecture** --- ### centralization level | **satellite architecture**
---
- plug many sensors \[**satellites**\] - fuse together on a single central unit → [**active safety domain controller**] - **360 degree** fusion + detection on controller - individual sensors do **not** have to be extremely good --- ## competition level --- ## competition level _"**what** should the fusion do?"_ --- ### competition level | **three types** --- ### competition level | **three types** ||| |:-----|:------| | **competitive** | sensors meant for same purpose \[RADAR + LiDAR\] | --- ### competition level | **three types** ||| |:-----|:------| | **competitive** | sensors meant for same purpose \[RADAR + LiDAR\] | | **complementary** | different sensors looking at different scenes \[multiple cameras\] | --- ### competition level | **three types** ||| |:-----|:------| | **competitive** | sensors meant for same purpose \[RADAR + LiDAR\] | | **complementary** | different sensors looking at different scenes \[multiple cameras\] | | **coordinated** | sensors produce a new scene from same object \[3D reconstruction\] | || --- ### competition level | **competitive** sensors meant for the **same purpose** --- ### competition level | **competitive** sensors meant for the **same purpose**
_e.g._ **Camera + LiDAR** --- ### competition level | **complementary** different sensors looking at **different scenes** --- ### competition level | **complementary** different sensors looking at **different scenes**
_e.g._ multiple cameras for creating a **panorama** --- ### competition level | **coordinated** sensors produce a **new scene** from same object --- ### competition level | **coordinated** sensors produce a **new scene** from same object
_e.g._ **3D reconstruction** --- ## sensor fusion example ### **camera + LiDAR** --- ## sensor fusion example | **camera + LiDAR**
--- ### camera + lidar | **complementary strengths**
---
||| |:-----|:------| | **camera** | excellent for **object classification** and understanding scenes | | **LiDAR** | good for **estimating distances** | || --- ### camera output → **bounding boxes** --- ### camera output → **2D bounding boxes**
--- ### LiDAR output → **point clouds** --- ### LiDAR output → **3D point clouds**
--- ### classifying the fusion --- ### classifying the fusion
---
||| |:-----|:------| | **"what"** | competition and redundancy | | **"where"** | doesn't matter \[for now; lots of options\] | | **"when"** | multiple options | || --- **"when"** | multiple options, ||| |:-----|:------| | **early** | fuse the raw data → pixels and point clouds | | **late** | fuse the results → bounding boxes | || --- ## early fusion --- ## early fusion fuse **raw data** as soon as sensors are plugged --- ## early fusion
---
- project **3D LiDAR point clouds** onto **2D image** - check whether point clouds belong to **2D bounding boxes** from camera --- ### early fusion | **point cloud projection** translate **3D point cloud** \[LiDAR frame\] → **2D projection** \[camera frame\] --- ### early fusion | **point cloud projection** translate **3D point cloud** \[LiDAR frame\] → **2D projection** \[camera frame\] 1. convert each 3D LiDAR point into **homogeneous coordinates** --- ### early fusion | **point cloud projection** translate **3D point cloud** \[LiDAR frame\] → **2D projection** \[camera frame\] 1. convert each 3D LiDAR point into **homogeneous coordinates** 2. apply **projection equations** \[translation/rotation\] to convert from LiDAR to camera --- ### early fusion | **point cloud projection** translate **3D point cloud** \[LiDAR frame\] → **2D projection** \[camera frame\] 1. convert each 3D LiDAR point into **homogeneous coordinates** 2. apply **projection equations** \[translation/rotation\] to convert from LiDAR to camera 3. transform back into **Euclidean coordinates** Note: - https://en.wikipedia.org/wiki/Homogeneous_coordinates - Homogeneous coordinates: coordinates of points, including points at infinity, can be represented using finite coordinates. - Formulas involving homogeneous coordinates are often simpler and more symmetric than their Cartesian counterparts - Transformations (rotation, scaling, translation, etc.) are simple matrix multiplications --- ### early fusion | projected point cloud
--- ### early fusion | **object detection** detect objects using the camera → **YOLO** again! --- ### early fusion | **ROI matching** _"**region of interest**"_ mapping --- ### early fusion | **ROI matching** _"**region of interest**"_ mapping fuse the data **inside each bounding box** --- ### early fusion | **ROI matching** _"**region of interest**"_ mapping fuse the data **inside each bounding box** ||| |:-----|:------| | for each **bounding box** | camera gives **classification** | | for each **LiDAR projected point** | **accurate distance** | || --- ### early fusion | **ROI matching** _"**region of interest**"_ mapping fuse the data **inside each bounding box** ||| |:-----|:------| | for each **bounding box** | camera gives **classification** | | for each **LiDAR projected point** | **accurate distance** | || objects are **measured accurately** and **classified** --- ### early fusion | **ROI matching problems** --- ### early fusion | **ROI matching problems**
---
- which point to pick for **distance**? - average / median / center point / **closest**? - does the point **belong to another** bounding box? --- ## late fusion --- ## late fusion fusing **results** after independent detection --- ## late fusion
---
- get **3D bounding boxes** on both ends → fuse results - get **2D bounding boxes** on both sides → fuse results --- ### late fusion | **late fusion in 3D** --- ### late fusion | **late fusion in 3D** multiple steps, --- ### late fusion | **late fusion in 3D**
**step 1** → 3D obstacle detection \[**LiDAR**\] Note: - 3D obstacle detection → ML methods - unsupervised machine learning - deep learning algos (e.g. RANDLA-NET) --- ### late fusion | **late fusion in 3D**
**step 2** → 3D obstacle detection \[**camera**\] → **much harder** Note: - deep learning + size/orientation of vehicles - IOU matching → bounding boxes from camera or lidar overlap in 3d/2d --- ### late fusion | **IOU matching in space**
**step 3** → **IoU matching** in space Note: - IOU matching → bounding boxes from camera or lidar overlap in 3d/2d --- ### late fusion | **IoU matching**
--- ### late fusion | **IoU matching in time** need to ensure the **frames also match in time**! --- ### late fusion | **IoU matching in time** need to ensure the **frames also match in time**! - associate objects **in time**, from frame to frame - also **predict next positions** --- ### late fusion | **IoU matching in time** need to ensure the **frames also match in time**! - associate objects **in time**, from frame to frame - also **predict next positions** - bounding boxes **overlap** between consecutive frames → same obstacle --- ### late fusion | **IoU matching in time** need to ensure the **frames also match in time**! - associate objects **in time**, from frame to frame - also **predict next positions** - bounding boxes **overlap** between consecutive frames → same obstacle algorithms used → **Kalman Filter**, **Hungarian Algorithm**, **SORT** Note: - SORT – simple online realtime tracking --- ## references - **IMUs** → [What is an IMU?](https://www.vectornav.com/resources/inertial-navigation-articles/what-is-an-inertial-measurement-unit-imu) - **sensor fusion classification** → [9 types of sensor fusion algorithms](https://www.thinkautonomous.ai/blog/?p=9-types-of-sensor-fusion-algorithms) - **camera + LiDAR fusion** → [LiDAR and Camera Sensor Fusion in Self-Driving Cars](https://www.thinkautonomous.ai/blog/?p=lidar-and-camera-sensor-fusion-in-self-driving-cars) - **3D bounding box estimation** → [arXiv:1612.00496](https://arxiv.org/pdf/1612.00496.pdf) - **homogeneous coordinates** → [Wikipedia](https://en.wikipedia.org/wiki/Homogeneous_coordinates) - **RANDLA-NET** → Hu et al., _RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds_, CVPR 2020 - **SORT tracker** → Bewley et al., _Simple Online and Realtime Tracking_, ICASSP 2016