# sensor fusion | the return ## **Design of Autonomous Systems** ### csci 6907/4907-section 86 ### Prof. **Sibin Mohan** --- consider a **LiDAR** and a **camera** looking at a pedestrian...
--- consider a **LiDAR** and a **camera** looking at a pedestrian...
consider the following situations... ---
consider the following situations... ||| |:-----|:------| | **situation** | **result** | | only **one** detects the pedestrian | use the other to increase chances | ---
consider the following situations... ||| |:-----|:------| | **situation** | **result** | | only **one** detects the pedestrian | use the other to increase chances | | **both** detect the pedestrian | better **accuracy + confidence** | || --- **sensor fusion** = **data fusion** --- ### sensor fusion | **classification** --- ### sensor fusion | **classification** three ways to classify sensor fusion, --- ### sensor fusion | **classification** three ways to classify sensor fusion, ||| |:-----|:------| | **abstraction level** | "_when_ should we fuse?" | --- ### sensor fusion | **classification** three ways to classify sensor fusion, ||| |:-----|:------| | **abstraction level** | "_when_ should we fuse?" | | **centralization level** | "_where_ is the fusion happening?" | --- ### sensor fusion | **classification** three ways to classify sensor fusion, ||| |:-----|:------| | **abstraction level** | "_when_ should we fuse?" | | **centralization level** | "_where_ is the fusion happening?" | | **competition level** | "_what_ should the fusion do?" | || --- ## abstraction level --- ## abstraction level _"**when** should we do the fusion?"_ --- ## abstraction level _"**when** should we do the fusion?"_
**three** types of abstraction level fusion (low, medium, high) --- ### abstraction level | **low-level fusion** --- ### abstraction level | **low-level fusion**
--- ### abstraction level | **low-level fusion**
- fuse **raw data** from multiple sensors --- ### abstraction level | **low-level fusion**
- fuse **raw data** from multiple sensors - _e.g._ point clouds (**LiDARs**) + pixels (**cameras**) Note: - Object detection is used in the process, but what's really doing the job is projecting the 3D point clouds into the image, and then associating this with the pixels. --- ### abstraction level | **low-level fusion**
- fuse **raw data** from multiple sensors - _e.g._ point clouds (**LiDARs**) + pixels (**cameras**) good for **object detection** --- ### abstraction level | **low-level fusion** |process| challenges | |:----|:---| |
| projecting 3D point clouds onto image | || --- ### abstraction level | **low-level fusion** |process| challenges | |:----|:---| |
| projecting 3D point clouds onto image
associating points with pixels | || --- ### abstraction level | **low-level fusion** |process| challenges | |:----|:---| |
| projecting 3D point clouds onto image
associating points with pixels | || |pros|cons| |:-----|:------| | future proof | huge processing requirements | || --- ### abstraction level | **mid-level fusion** --- ### abstraction level | **mid-level fusion**
Note: - Information Loss — if tracking in one is incorrect, then everything is messed up --- ### abstraction level | **mid-level fusion**
- fusing objects → **detected independently** --- ### abstraction level | **mid-level fusion**
- fusing objects → **detected independently** - each sensor does its own detection --- ### abstraction level | **mid-level fusion**
- fusing objects → **detected independently** - each sensor does its own detection - _e.g._ camera and radar detect objects --- ### abstraction level | **mid-level fusion**
- fusing objects → **detected independently** - each sensor does its own detection - _e.g._ camera and radar detect objects - fused using **Kalman Filter** --- ### abstraction level | **mid-level fusion**
|process | | |:-------|:------| |3D bounding box (LiDAR) + 2D bounding box (camera) | | --- ### abstraction level | **mid-level fusion**
|process | | |:-------|:------| |3D bounding box (LiDAR) + 2D bounding box (camera) | | |projecting 3D result into 2D|| --- ### abstraction level | **mid-level fusion**
|process | | |:-------|:------| |3D bounding box (LiDAR) + 2D bounding box (camera) | | |projecting 3D result into 2D|| |data fusion happens in **2D**|| --- ### abstraction level | **mid-level fusion**
|process | challenges | |:-------|:------| |3D bounding box (LiDAR) + 2D bounding box (camera) | if tracking in one is incorrect, then everything is messed up | |projecting 3D result into 2D|| |data fusion happens in **2D**|| || --- ### abstraction level | **mid-level fusion**
|pros|cons| |:-----|:------| | simplicity | potential to **lose information** | || --- ### abstraction level | **high-level fusion** --- ### abstraction level | **high-level fusion**
Note: - Information Loss — if tracking in one is incorrect, then everything is messed up --- ### abstraction level | **high-level fusion**
- fuse objects and their **trajectories** --- ### abstraction level | **high-level fusion**
- fuse objects and their **trajectories** - rely on detections --- ### abstraction level | **high-level fusion**
- fuse objects and their **trajectories** - detections + **predictions** + **tracking** --- ### abstraction level | **high-level fusion**
||| |:-----|:------| | **pros** | **cons** | | further simplicity | **too much** information loss | || --- ## centralization level --- ## centralization level _"**where** is the fusion happening?"_ --- ### centralization level | **three types** --- ### centralization level | **three types**
---
||| |:-----|:------| | **centralized** | one central unit deals with it \[low-level\] | ---
||| |:-----|:------| | **centralized** | one central unit deals with it \[low-level\] | | **decentralized** | each sensor fuses data and forwards to next one | ---
||| |:-----|:------| | **centralized** | one central unit deals with it \[low-level\] | | **decentralized** | each sensor fuses data and forwards to next one | | **distributed** | each sensor processes data locally
and sends to next unit \[late\] | || --- ### centralization level | **satellite architecture** --- ### centralization level | **satellite architecture**
---
- plug many sensors \[**satellites**\] ---
- plug many sensors \[**satellites**\] - fuse on single central unit - **active safety domain controller** ---
- plug many sensors \[**satellites**\] - fuse together on a single central unit - [**active safety domain controller**] - **360 degree** fusion + detection on controller ---
- plug many sensors \[**satellites**\] - fuse together on a single central unit - [**active safety domain controller**] - **360 degree** fusion + detection on controller - individual sensors do **not** have to be extremely good --- ## competition level --- ## competition level _"**what** should the fusion do?"_ --- ### competition level | **three types** --- ### competition level | **three types** ||| |:-----|:------| | **competitive** | sensors meant for same purpose
\[RADAR + LiDAR\] | --- ### competition level | **three types** ||| |:-----|:------| | **competitive** | sensors meant for same purpose
\[RADAR + LiDAR\] | | **complementary** | different sensors looking at different scenes
\[multiple cameras\] | --- ### competition level | **three types** ||| |:-----|:------| | **competitive** | sensors meant for same purpose
\[RADAR + LiDAR\] | | **complementary** | different sensors looking at different scenes
\[multiple cameras\] | | **coordinated** | sensors produce a new scene from same object
\[3D reconstruction\] | || --- ### competition level | **competitive** sensors meant for the **same purpose** --- ### competition level | **competitive** sensors meant for the **same purpose**
_e.g._ **Camera + LiDAR** --- ### competition level | **complementary** different sensors looking at **different scenes** --- ### competition level | **complementary** different sensors looking at **different scenes**
_e.g._ multiple cameras for creating a **panorama** --- ### competition level | **coordinated** sensors produce a **new scene** from same object --- ### competition level | **coordinated** sensors produce a **new scene** from same object
_e.g._ **3D reconstruction** --- ## high-level sensor fusion example ### **camera + LiDAR** --- ## sensor fusion example | **camera + LiDAR**
--- ### camera + lidar | **complementary strengths**
---
||| |:-----|:------| | **camera** | **object classification** and understanding scenes | | **LiDAR** | good for **estimating distances** | || --- ### camera output → **bounding boxes** --- ### camera output → **2D** bounding boxes
--- ### LiDAR output → **point clouds** --- ### LiDAR output → **3D** point clouds
--- ### classifying the fusion --- ### classifying the fusion ||| |:-----|:------| | **"what"** | competition and redundancy | --- ### classifying the fusion ||| |:-----|:------| | **"what"** | competition and redundancy | | **"where"** | doesn't matter \[for now; lots of options\] | --- ### classifying the fusion ||| |:-----|:------| | **"what"** | competition and redundancy | | **"where"** | doesn't matter \[for now; lots of options\] | | **"when"** | multiple options | || --- **"when"** | multiple options, ||| |:-----|:------| | **early** | fuse the raw data → pixels and point clouds | | **late** | fuse the results → bounding boxes | || --- ## early fusion --- ## early fusion fuse **raw data** as soon as sensors are plugged --- ## early fusion
---
- project **3D LiDAR point clouds** onto **2D image** - check if point clouds belong to **2D bounding boxes** from camera --- ### early fusion | **point cloud projection**
translate, **3D point cloud** \[LiDAR frame\] → **2D projection** \[camera frame\] --- translate, **3D point cloud** \[LiDAR frame\] → **2D projection** \[camera frame\] 1. convert each 3D LiDAR point → **homogeneous coordinates** --- translate, **3D point cloud** \[LiDAR frame\] → **2D projection** \[camera frame\] 1. convert each 3D LiDAR point → **homogeneous coordinates** 2. apply **projection equations** \[translation/rotation\] --- translate, **3D point cloud** \[LiDAR frame\] → **2D projection** \[camera frame\] 1. convert each 3D LiDAR point → **homogeneous coordinates** 2. apply **projection equations** \[translation/rotation\]
→ to convert from LiDAR to camera --- translate, **3D point cloud** \[LiDAR frame\] → **2D projection** \[camera frame\] 1. convert each 3D LiDAR point → **homogeneous coordinates** 2. apply **projection equations** \[translation/rotation\]
→ to convert from LiDAR to camera 3. transform back into **Euclidean coordinates** --- ### homogenous coordinates [system of coordinates](https://en.wikipedia.org/wiki/Homogeneous_coordinates) $(x, y, w)$ → used in **projective** geometry --- ### homogenous coordinates [system of coordinates](https://en.wikipedia.org/wiki/Homogeneous_coordinates) $(x, y, w)$ → used in **projective** geometry - coordinates → represented as **finite** coordinates --- ### homogenous coordinates [system of coordinates](https://en.wikipedia.org/wiki/Homogeneous_coordinates) $(x, y, w)$ → used in **projective** geometry - coordinates → represented as **finite** coordinates
\[including points at **infinity**!\] --- ### homogenous coordinates [system of coordinates](https://en.wikipedia.org/wiki/Homogeneous_coordinates) $(x, y, w)$ → used in **projective** geometry - coordinates → represented as **finite** coordinates
\[including points at **infinity**!\] - **simpler** formulas, more **symmetric** --- ### homogenous coordinates [system of coordinates](https://en.wikipedia.org/wiki/Homogeneous_coordinates) $(x, y, w)$ → used in **projective** geometry - coordinates → represented as **finite** coordinates
\[including points at **infinity**!\] - **simpler** formulas, more **symmetric** - transformations → simple matrix multiplications --- ### homogenous coordinates [system of coordinates](https://en.wikipedia.org/wiki/Homogeneous_coordinates) $(x, y, w)$ → used in **projective** geometry - coordinates → represented as **finite** coordinates
\[including points at **infinity**!\] - **simpler** formulas, more **symmetric** - transformations → simple matrix multiplications
\[rotation, scaling, translation, _etc._ \] --- ### homogenous coordinates [system of coordinates](https://en.wikipedia.org/wiki/Homogeneous_coordinates) $(x, y, w)$ → used in **projective** geometry
Note: - $w$ is a scaling factor (or weight) that allows us to represent an $n$-dimensional space within an $(n+1)$-dimensional framework.Think of it as a "zoom" or "projection" coordinate. It tells you how far a point is from the "eye" or the origin of the projection. --- ### early fusion | projected point cloud
--- ### early fusion | **object detection** detect objects using the camera → **YOLO** again! --- ### early fusion | **ROI matching** _"**region of interest**"_ mapping --- ### early fusion | **ROI matching** _"**region of interest**"_ mapping fuse the data **inside each bounding box** --- ### early fusion | **ROI matching** _"**region of interest**"_ mapping fuse the data **inside each bounding box** ||| |:-----|:------| | for each **bounding box** | camera gives **classification** | | for each **LiDAR projected point** | **accurate distance** | || --- ### early fusion | **ROI matching** _"**region of interest**"_ mapping fuse the data **inside each bounding box** ||| |:-----|:------| | for each **bounding box** | camera gives **classification** | | for each **LiDAR projected point** | **accurate distance** | || objects are **measured accurately** and **classified** --- ### early fusion | **ROI matching problems** --- ### early fusion | **ROI matching problems**
---
- which point to pick for **distance**? ---
- which point to pick for **distance**? - average / median / center point / **closest**? ---
- which point to pick for **distance**? - average / median / center point / **closest**? - does the point **belong to another** bounding box? --- ## late fusion --- ## late fusion fusing **results** after independent detection --- ## late fusion
---
- get **3D bounding boxes** on both ends → fuse results - get **2D bounding boxes** on both sides → fuse results --- ### late fusion | **late fusion in 3D** --- ### late fusion | **late fusion in 3D** multiple steps, --- ### late fusion | **late fusion in 3D**
- **step 1** → 3D obstacle detection \[**LiDAR**\] --- ### late fusion | **late fusion in 3D**
- **step 1** → 3D obstacle detection \[**LiDAR**\] - ML methods - unsupervised machine learning - deep learning algos (_e.g.,_ [RandLA-Net](https://github.com/QingyongHu/RandLA-Net)) --- ### late fusion | **late fusion in 3D**
- **step 1** → 3D obstacle detection \[**LiDAR**\] - **step 2** → 3D obstacle detection \[**camera**\] --- ### late fusion | **late fusion in 3D**
- **step 1** → 3D obstacle detection \[**LiDAR**\] - **step 2** → 3D obstacle detection \[**camera**\] → **much harder** --- ### late fusion | **late fusion in 3D**
- **step 1** → 3D obstacle detection \[**LiDAR**\] - **step 2** → 3D obstacle detection \[**camera**\] → **much harder** - deep learning + size/orientation of vehicles --- ### late fusion | **late fusion in 3D**
- **step 1** → 3D obstacle detection \[**LiDAR**\] - **step 2** → 3D obstacle detection \[**camera**\] → **much harder** - deep learning + size/orientation of vehicles - IOU matching → bounding boxes overlap in 3D/2D --- ### late fusion | **IOU matching in space**
- **step 1** → 3D obstacle detection \[**LiDAR**\] - **step 2** → 3D obstacle detection \[**camera**\] → **much harder** - **step 3** → **IoU matching** in space --- ### late fusion | **IoU matching**
--- ### late fusion | **IoU matching**
--- ### late fusion | **IoU matching in time** --- ### late fusion | **IoU matching in time** ensure the **frames also match in time**! --- ### late fusion | **IoU matching in time** ensure the **frames also match in time**! - associate objects **in time**, from frame to frame --- ### late fusion | **IoU matching in time** ensure the **frames also match in time**! - associate objects **in time**, from frame to frame - also **predict next positions** --- ### late fusion | **IoU matching in time** ensure the **frames also match in time**! - associate objects **in time**, from frame to frame - also **predict next positions** - bounding boxes **overlap** in consecutive frames → **same obstacle** --- ### late fusion | **IoU matching in time** ensure the **frames also match in time**! - associate objects **in time**, from frame to frame - also **predict next positions** - bounding boxes **overlap** in consecutive frames → **same obstacle** algorithms used → **Kalman Filter**, **Hungarian Algorithm**, **SORT** Note: - SORT – simple online realtime tracking --- ## references - **IMUs** → [What is an IMU?](https://www.vectornav.com/resources/inertial-navigation-articles/what-is-an-inertial-measurement-unit-imu) - **sensor fusion classification** → [9 types of sensor fusion algorithms](https://www.thinkautonomous.ai/blog/?p=9-types-of-sensor-fusion-algorithms) - **camera + LiDAR fusion** → [LiDAR and Camera Sensor Fusion in Self-Driving Cars](https://www.thinkautonomous.ai/blog/?p=lidar-and-camera-sensor-fusion-in-self-driving-cars) - **3D bounding box estimation** → [arXiv:1612.00496](https://arxiv.org/pdf/1612.00496.pdf) - **homogeneous coordinates** → [Wikipedia](https://en.wikipedia.org/wiki/Homogeneous_coordinates) - **RANDLA-NET** → Hu et al., _RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds_, CVPR 2020 - **SORT tracker** → Bewley et al., _Simple Online and Realtime Tracking_, ICASSP 2016