<aside> 💡 Most automated driving systems comprise a diverse sensor set, including several cameras, Radars, and LiDARs, ensuring a complete 360 coverage in near and far regions. Unlike Radar and LiDAR, which measure directly in 3D, cameras capture a 2D perspective projection with inherent depth ambiguity. However, it is essential to produce perception outputs in 3D to enable the spatial reasoning of other agents and structures for optimal path planning. The 3D space is typically simplified to the BEV space by omitting the less relevant Z-coordinate, which corresponds to the height dimension.

</aside>

The most basic approach for getting BEV from camera data is Inverse Perspective Mapping - Severe distortions and too simplistic.

Advantages of BEV : Use for sensor fusion and path planning

Advantages of PV : Used for segmentation and tracking

‼️There is a disconnect though: Generation of BEV required depth estimation that cannot be directly got from PV.

Thus BEV fusion becomes the main challenge : How should you combine the information from cameras and LIDAR to generate BEV image.

Techniques

Perception Tasks

Network Architectures