# sensor fusion | the return

## **Design of Autonomous Systems**
### csci 6907/4907-section 86
### Prof. **Sibin Mohan**

---

consider a **LiDAR** and a **camera** looking at a pedestrian...

---

consider a **LiDAR** and a **camera** looking at a pedestrian...

consider the following situations...

---

consider the following situations...

|||
|:-----|:------|
| **situation** | **result** |
| only **one** detects the pedestrian | use the other to increase chances |

---

consider the following situations...

|||
|:-----|:------|
| **situation** | **result** |
| only **one** detects the pedestrian | use the other to increase chances |
| **both** detect the pedestrian | better **accuracy + confidence** |
||

---

**sensor fusion** = **data fusion**

---

### sensor fusion | **classification**

---

### sensor fusion | **classification**

three ways to classify sensor fusion,

---

### sensor fusion | **classification**

three ways to classify sensor fusion,

|||
|:-----|:------|
| **abstraction level** | "_when_ should we fuse?" |

---

### sensor fusion | **classification**

three ways to classify sensor fusion,

|||
|:-----|:------|
| **abstraction level** | "_when_ should we fuse?" |
| **centralization level** | "_where_ is the fusion happening?" |

---

### sensor fusion | **classification**

three ways to classify sensor fusion,

|||
|:-----|:------|
| **abstraction level** | "_when_ should we fuse?" |
| **centralization level** | "_where_ is the fusion happening?" |
| **competition level** | "_what_ should the fusion do?" |
||

---

## abstraction level

---

## abstraction level

_"**when** should we do the fusion?"_

---

## abstraction level

_"**when** should we do the fusion?"_

**three** types of abstraction level fusion

(low, medium, high)

---

### abstraction level | **low-level fusion**

---

### abstraction level | **low-level fusion**

---

### abstraction level | **low-level fusion**

- fuse **raw data** from multiple sensors

---

### abstraction level | **low-level fusion**

- fuse **raw data** from multiple sensors
- _e.g._ point clouds (**LiDARs**) + pixels (**cameras**)

Note:
- Object detection is used in the process, but what's really doing the job is projecting the 3D point clouds into the image, and then associating this with the pixels.

---

### abstraction level | **low-level fusion**

- fuse **raw data** from multiple sensors
- _e.g._ point clouds (**LiDARs**) + pixels (**cameras**)

good for **object detection**

---

### abstraction level | **low-level fusion**

|process| challenges |
|:----|:---|
| <img src="img/sensor_fusion_high/image4.png" width="400"> | projecting 3D point clouds onto image |
||

---

### abstraction level | **low-level fusion**

|process| challenges |
|:----|:---|
| <img src="img/sensor_fusion_high/image4.png" width="400"> | projecting 3D point clouds onto image associating points with pixels |
||

---

### abstraction level | **low-level fusion**

|process| challenges |
|:----|:---|
| <img src="img/sensor_fusion_high/image4.png" width="400"> | projecting 3D point clouds onto image associating points with pixels |
||

|pros|cons|
|:-----|:------|
| future proof | huge processing requirements |
||

---

### abstraction level | **mid-level fusion**

---

### abstraction level | **mid-level fusion**

Note:
- Information Loss — if tracking in one is incorrect, then everything is messed up

---

### abstraction level | **mid-level fusion**

- fusing objects → **detected independently**

---

### abstraction level | **mid-level fusion**

- fusing objects → **detected independently**
- each sensor does its own detection
---

### abstraction level | **mid-level fusion**

- fusing objects → **detected independently**
- each sensor does its own detection
- _e.g._ camera and radar detect objects

---

### abstraction level | **mid-level fusion**

- fusing objects → **detected independently**
- each sensor does its own detection
- _e.g._ camera and radar detect objects 
- fused using **Kalman Filter**

---

### abstraction level | **mid-level fusion**

|process | |
|:-------|:------|
|3D bounding box (LiDAR) + 2D bounding box (camera) | |

---

### abstraction level | **mid-level fusion**

|process | |
|:-------|:------|
|3D bounding box (LiDAR) + 2D bounding box (camera) | |
|projecting 3D result into 2D||

---

### abstraction level | **mid-level fusion**

|process | |
|:-------|:------|
|3D bounding box (LiDAR) + 2D bounding box (camera) | |
|projecting 3D result into 2D||
|data fusion happens in **2D**||

---

### abstraction level | **mid-level fusion**

|process | challenges |
|:-------|:------|
|3D bounding box (LiDAR) + 2D bounding box (camera) | if tracking in one is incorrect, then everything is messed up |
|projecting 3D result into 2D||
|data fusion happens in **2D**||
||

---

### abstraction level | **mid-level fusion**

|pros|cons|
|:-----|:------|
| simplicity | potential to **lose information** |
||

---

### abstraction level | **high-level fusion**

---

### abstraction level | **high-level fusion**

Note:
- Information Loss — if tracking in one is incorrect, then everything is messed up

---

### abstraction level | **high-level fusion**

- fuse objects and their **trajectories**

---

### abstraction level | **high-level fusion**

- fuse objects and their **trajectories**
- rely on detections

---

### abstraction level | **high-level fusion**

- fuse objects and their **trajectories**
- detections + **predictions** + **tracking**

---

### abstraction level | **high-level fusion**

|||
|:-----|:------|
| **pros** | **cons** |
| further simplicity | **too much** information loss |
||

---

## centralization level

---

## centralization level

_"**where** is the fusion happening?"_

---

### centralization level | **three types**

---

### centralization level | **three types**

---

|||
|:-----|:------|
| **centralized** | one central unit deals with it \[low-level\] |

---

|||
|:-----|:------|
| **centralized** | one central unit deals with it \[low-level\] |
| **decentralized** | each sensor fuses data and forwards to next one |

---

|||
|:-----|:------|
| **centralized** | one central unit deals with it \[low-level\] |
| **decentralized** | each sensor fuses data and forwards to next one |
| **distributed** | each sensor processes data locally and sends to next unit \[late\] |
||

---

### centralization level | **satellite architecture**

---

### centralization level | **satellite architecture**

---

- plug many sensors \[**satellites**\]

---

- plug many sensors \[**satellites**\]
- fuse on single central unit 
  - **active safety domain controller**

---

- plug many sensors \[**satellites**\]
- fuse together on a single central unit 
  - [**active safety domain controller**]
- **360 degree** fusion + detection on controller

---

- plug many sensors \[**satellites**\]
- fuse together on a single central unit 
  - [**active safety domain controller**]
- **360 degree** fusion + detection on controller
- individual sensors do **not** have to be extremely good

---

## competition level

---

## competition level

_"**what** should the fusion do?"_

---

### competition level | **three types**

---

### competition level | **three types**

|||
|:-----|:------|
| **competitive** | sensors meant for same purpose \[RADAR + LiDAR\] |

---

### competition level | **three types**

|||
|:-----|:------|
| **competitive** | sensors meant for same purpose \[RADAR + LiDAR\] |
| **complementary** | different sensors looking at different scenes \[multiple cameras\] |

---

### competition level | **three types**

|||
|:-----|:------|
| **competitive** | sensors meant for same purpose \[RADAR + LiDAR\] |
| **complementary** | different sensors looking at different scenes \[multiple cameras\] |
| **coordinated** | sensors produce a new scene from same object \[3D reconstruction\] |
||

---

### competition level | **competitive**

sensors meant for the **same purpose**

---

### competition level | **competitive**

sensors meant for the **same purpose**

_e.g._ **Camera + LiDAR**

---

### competition level | **complementary**

different sensors looking at **different scenes**

---

### competition level | **complementary**

different sensors looking at **different scenes**

_e.g._ multiple cameras for creating a **panorama**

---

### competition level | **coordinated**

sensors produce a **new scene** from same object

---

### competition level | **coordinated**

sensors produce a **new scene** from same object

_e.g._ **3D reconstruction**

---

## high-level sensor fusion example

### **camera + LiDAR**

---

## sensor fusion example | **camera + LiDAR**

---

### camera + lidar | **complementary strengths**

---

|||
|:-----|:------|
| **camera** | **object classification** and understanding scenes |
| **LiDAR** | good for **estimating distances** |
||

---

### camera output → **bounding boxes**

---

### camera output → **2D** bounding boxes

---

### LiDAR output → **point clouds**

---

### LiDAR output → **3D** point clouds

---

### classifying the fusion

---

### classifying the fusion

|||
|:-----|:------|
| **"what"** | competition and redundancy |

---

### classifying the fusion

|||
|:-----|:------|
| **"what"** | competition and redundancy |
| **"where"** | doesn't matter \[for now; lots of options\] |

---

### classifying the fusion

|||
|:-----|:------|
| **"what"** | competition and redundancy |
| **"where"** | doesn't matter \[for now; lots of options\] |
| **"when"** | multiple options |
||

---

**"when"** | multiple options,

|||
|:-----|:------|
| **early** | fuse the raw data → pixels and point clouds |
| **late** | fuse the results → bounding boxes |
||

---

## early fusion

---

## early fusion

fuse **raw data** as soon as sensors are plugged

---

## early fusion

---

- project **3D LiDAR point clouds** onto **2D image**
- check if point clouds belong to **2D bounding boxes** from camera

---

### early fusion | **point cloud projection**

translate,

**3D point cloud** \[LiDAR frame\] → **2D projection** \[camera frame\]

---

translate,

**3D point cloud** \[LiDAR frame\] → **2D projection** \[camera frame\]

1. convert each 3D LiDAR point → **homogeneous coordinates**

---

translate,

**3D point cloud** \[LiDAR frame\] → **2D projection** \[camera frame\]

1. convert each 3D LiDAR point → **homogeneous coordinates**
2. apply **projection equations** \[translation/rotation\]

---

translate,

**3D point cloud** \[LiDAR frame\] → **2D projection** \[camera frame\]

1. convert each 3D LiDAR point → **homogeneous coordinates**
2. apply **projection equations** \[translation/rotation\] 
 → to convert from LiDAR to camera

---

translate,

**3D point cloud** \[LiDAR frame\] → **2D projection** \[camera frame\]

1. convert each 3D LiDAR point → **homogeneous coordinates**
2. apply **projection equations** \[translation/rotation\] 
 → to convert from LiDAR to camera
3. transform back into **Euclidean coordinates**

---

### homogenous coordinates

[system of coordinates](https://en.wikipedia.org/wiki/Homogeneous_coordinates) $(x, y, w)$ → used in **projective** geometry

---

### homogenous coordinates

[system of coordinates](https://en.wikipedia.org/wiki/Homogeneous_coordinates) $(x, y, w)$ → used in **projective** geometry

- coordinates → represented as **finite** coordinates

---

### homogenous coordinates

[system of coordinates](https://en.wikipedia.org/wiki/Homogeneous_coordinates) $(x, y, w)$ → used in **projective** geometry

- coordinates → represented as **finite** coordinates 
 \[including points at **infinity**!\]

---

### homogenous coordinates

[system of coordinates](https://en.wikipedia.org/wiki/Homogeneous_coordinates) $(x, y, w)$ → used in **projective** geometry

- coordinates → represented as **finite** coordinates 
 \[including points at **infinity**!\]
- **simpler** formulas, more **symmetric**

---

### homogenous coordinates

[system of coordinates](https://en.wikipedia.org/wiki/Homogeneous_coordinates) $(x, y, w)$ → used in **projective** geometry

- coordinates → represented as **finite** coordinates 
 \[including points at **infinity**!\]
- **simpler** formulas, more **symmetric**
- transformations → simple matrix multiplications

---

### homogenous coordinates

[system of coordinates](https://en.wikipedia.org/wiki/Homogeneous_coordinates) $(x, y, w)$ → used in **projective** geometry

- coordinates → represented as **finite** coordinates 
 \[including points at **infinity**!\]
- **simpler** formulas, more **symmetric**
- transformations → simple matrix multiplications
 \[rotation, scaling, translation, _etc._ \]

---

### homogenous coordinates

[system of coordinates](https://en.wikipedia.org/wiki/Homogeneous_coordinates) $(x, y, w)$ → used in **projective** geometry

Note:
- $w$ is a scaling factor (or weight) that allows us to represent an $n$-dimensional space within an $(n+1)$-dimensional framework.Think of it as a "zoom" or "projection" coordinate. It tells you how far a point is from the "eye" or the origin of the projection.

---

### early fusion | projected point cloud

---

### early fusion | **object detection**

detect objects using the camera → **YOLO** again!

---

### early fusion | **ROI matching**

_"**region of interest**"_ mapping

---

### early fusion | **ROI matching**

_"**region of interest**"_ mapping

fuse the data **inside each bounding box**

---

### early fusion | **ROI matching**

_"**region of interest**"_ mapping

fuse the data **inside each bounding box**

|||
|:-----|:------|
| for each **bounding box** | camera gives **classification** |
| for each **LiDAR projected point** | **accurate distance** |
||

---

### early fusion | **ROI matching**

_"**region of interest**"_ mapping

fuse the data **inside each bounding box**

|||
|:-----|:------|
| for each **bounding box** | camera gives **classification** |
| for each **LiDAR projected point** | **accurate distance** |
||

objects are **measured accurately** and **classified**

---

### early fusion | **ROI matching problems**

---

### early fusion | **ROI matching problems**

---

- which point to pick for **distance**?

---

- which point to pick for **distance**?
  - average / median / center point / **closest**?

---

- which point to pick for **distance**?
  - average / median / center point / **closest**?
- does the point **belong to another** bounding box?

---

## late fusion

---

## late fusion

fusing **results** after independent detection

---

## late fusion

---

- get **3D bounding boxes** on both ends → fuse results
- get **2D bounding boxes** on both sides → fuse results

---

### late fusion | **late fusion in 3D**

---

### late fusion | **late fusion in 3D**

multiple steps,

---

### late fusion | **late fusion in 3D**

- **step 1** → 3D obstacle detection \[**LiDAR**\]

---

### late fusion | **late fusion in 3D**

- **step 1** → 3D obstacle detection \[**LiDAR**\]
- ML methods
  - unsupervised machine learning
  - deep learning algos (_e.g.,_ [RandLA-Net](https://github.com/QingyongHu/RandLA-Net))

---

### late fusion | **late fusion in 3D**

- **step 1** → 3D obstacle detection \[**LiDAR**\]
- **step 2** → 3D obstacle detection \[**camera**\]

---

### late fusion | **late fusion in 3D**

- **step 1** → 3D obstacle detection \[**LiDAR**\]
- **step 2** → 3D obstacle detection \[**camera**\] → **much harder**

---

### late fusion | **late fusion in 3D**

- **step 1** → 3D obstacle detection \[**LiDAR**\]
- **step 2** → 3D obstacle detection \[**camera**\] → **much harder**
  - deep learning + size/orientation of vehicles

---

### late fusion | **late fusion in 3D**

- **step 1** → 3D obstacle detection \[**LiDAR**\]
- **step 2** → 3D obstacle detection \[**camera**\] → **much harder**
  - deep learning + size/orientation of vehicles
  - IOU matching → bounding boxes overlap in 3D/2D

---

### late fusion | **IOU matching in space**

- **step 1** → 3D obstacle detection \[**LiDAR**\]
- **step 2** → 3D obstacle detection \[**camera**\] → **much harder**
- **step 3** → **IoU matching** in space

---

### late fusion | **IoU matching**

---

### late fusion | **IoU matching**

---

### late fusion | **IoU matching in time**

---

### late fusion | **IoU matching in time**

ensure the **frames also match in time**!

---

### late fusion | **IoU matching in time**

ensure the **frames also match in time**!

- associate objects **in time**, from frame to frame

---

### late fusion | **IoU matching in time**

ensure the **frames also match in time**!

- associate objects **in time**, from frame to frame
- also **predict next positions**

---

### late fusion | **IoU matching in time**

ensure the **frames also match in time**!

- associate objects **in time**, from frame to frame
- also **predict next positions**
- bounding boxes **overlap** in consecutive frames → **same obstacle**

---

### late fusion | **IoU matching in time**

ensure the **frames also match in time**!

- associate objects **in time**, from frame to frame
- also **predict next positions**
- bounding boxes **overlap** in consecutive frames → **same obstacle**

algorithms used → **Kalman Filter**, **Hungarian Algorithm**, **SORT**

Note:
- SORT – simple online realtime tracking

---

## references

- **IMUs** → [What is an IMU?](https://www.vectornav.com/resources/inertial-navigation-articles/what-is-an-inertial-measurement-unit-imu)
- **sensor fusion classification** → [9 types of sensor fusion algorithms](https://www.thinkautonomous.ai/blog/?p=9-types-of-sensor-fusion-algorithms)
- **camera + LiDAR fusion** → [LiDAR and Camera Sensor Fusion in Self-Driving Cars](https://www.thinkautonomous.ai/blog/?p=lidar-and-camera-sensor-fusion-in-self-driving-cars)
- **3D bounding box estimation** → [arXiv:1612.00496](https://arxiv.org/pdf/1612.00496.pdf)
- **homogeneous coordinates** → [Wikipedia](https://en.wikipedia.org/wiki/Homogeneous_coordinates)
- **RANDLA-NET** → Hu et al., _RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds_, CVPR 2020
- **SORT tracker** → Bewley et al., _Simple Online and Realtime Tracking_, ICASSP 2016