Autonomous Driving Perception System on FPGA

Bai, Lin

Etd

Autonomous Driving Perception System on FPGA

Public

In recent years, convolutional neural network (CNN) has gained popularity in many engineering applications especially for computer vision. In order to achieve better performance, more complex structures and advanced operations are incorporated into neural networks, which results in very long inference time. But real-time perception systems are necessary for autonomous vehicles. In this dissertation, it investigates four major tasks in an autonomous driving perception system, including image classification, semantic segmentation, object detection, and depth completion. The vast majority of deep neural networks are targeted on graphics processing unit (GPU) that has powerful computing capacity and large data bandwidth. The issues of using GPUs are obvious. When using one powerful GPU as central computation unit, all the tasks are running on it. This requires a complicated task scheduler to make sure the real-time processing for all tasks. When using multiple GPUs for different perception tasks, considering its high power consumption, an autonomous vehicle may not be able to sustain many GPUs. Therefore, this dissertation proposes a near sensor solution to build CNN hardware accelerators on an FPGA for specific tasks individually. This approach brings the following benefits: firstly, only the processed information will be transmitted. This leads to reduction on bandwidth requirement. Secondly, the task specific component such as deconvolution can be removed for object detection accelerators. This results in less power consumption. During hardware friendly networks design, two folds are considered. (1) It explores lightweight neural network structures with less parameters and operations that can still achieve comparable accuracy. (2) It presents efficient hardware architectures that optimizes parallel processing and pipeline scheduling for high throughput and low latency. More specifically, a CNN accelerator is designed and implemented on the FPGA for each of the aforementioned perception tasks. Besides the highly efficient hardware architecture, to further alleviate the computation and bandwidth burden, multiple optimizations are applies to CNN models: some of them are applied to network computation, such as depthwise separable convolution and fixed point computation; some of them are applied to network structure, for instance, deploying distance transformation and then residual learning, and bi-linear interpolation instead of deconvolution. When evaluated using the KITTI and Semantic-KITTI benchmarks, the lightweight CNN models results the accuracy that is comparable to the state-of-the-art networks. The FPGA implementations meet or exceed the real-time, low-power performance requirement for an automated driving system.

Creator