As AI applications such as autonomous driving systems are getting more and more popular, there is an increasing need for the technique of detecting 3D objects. It is not straightforward to detect 3D objects accurately in real time by using existing object detection algorithms that use image data. Although there are also networks using LiDAR data as well as image data, they do not bear autonomous driving in terms of speed. This paper presents a light-weight multi-view neural network using multiple types of sensors such as cameras and LiDARs. An evaluation using a benchmark for autonomous driving systems demonstrated that the presented network performs faster than existing 3D object detection algorithms without degradation of detection accuracy.


The Normal Distributions Transform(NDT) algorithm, a scan matching method using 3D pointcloud map, is an approach to self-localization which is necessary to fully autonomous driving system and autonomous mobile robot. This algorithm is often applied to downsampled input pointcloud from Light Detection and Ranging(LiDAR) for the real-time environment, however speed and accuracy are in the relationship of trade-off. In this paper, we propose fast GPU implementation of the scan matching method, focusing on parallelism of the main computing process of the NDT algorithm. As a way of evaluation, we measured execution time of our GPU implementation by changing a rate of downsampling and GPU architecture, and compared with execution time of the original CPU implementation. This comparison indicated our GPU implementation accelerate the scan matching for data of low compressibility.

3D Mesh Map Generation

In the world of autonomous driving, three dimensional surface reconstruction of point clouds serves an important role, which enables cars to create urban city maps for navigational purposes. In this paper, we implement a sparse and incremental surface reconstruction method using the ball pivoting function provided in Meshlab, a point cloud processing system. Compared to traditional surface reconstruction methods, our surface is sparse, which means the computing cost is low compared to other dense methods. Furthermore the construction is incremental so other applications can utilize the surface simultaneously as soon as each segment of surface has been generated. We evaluate the framework by applying it to a point cloud map of the test field obtained by the LiDAR sensor. The NDT matching algorithm was used to orient each scan of the point clouds. We compare our method with the conventional batch method that creates the surface by meshing all the points at once.

SLAM Accelerator Design

3D scan registration is an important method for localization on mobile devices. The 3D normal distribution transform (3D-NDT) is an efficient algorithm for 3D registration compared to the iterative closest point (ICP). Input point data are captured by 3D laser range finders, the resolutions of which are continuously getting higher. Localization for fast-moving objects such as automobile requires short turnaround time in the order of milliseconds. At the same time, embedded systems are sensitive to their power consumption. There exist CPU and GPU implementations of the algorithm, however, lack of flexibility makes them difficult to coordinate between calculation capability and usability in mobile system. To satisfy these requirements, a hardware implementation of the algorithm using FPGA with appropriate datatype is presented in this dissertation. The result shows the new option for 3D registration, which is also expected to be implemented to ASIC in future work.

3D Visual SLAM

Scan registration of multiple range images, often referred to as point cloud, is a major function of Simultaneous Localization and Mapping (SLAM). An issue of concern for this mapping is a computation cost, which becomes more expensive as the scale of mapping increases. Previous work developed an efficient method for updating map data with reduced computational complexity of O(N), where N is the number of input scan points. This paper presents an incremental map updating method for point cloud mapping, which extends the idea of previous work to produce more efficient 3D maps in time O(N). Using our method, each voxel constituting the map are updated non-uniformly according to the information of the voxel. Thus we obtain non-uniform distribution of points so that the calculation time required for updating the voxel is reduced according to the feature of mapping environments. We conduct several experiments using the Point Cloud Library (PCL), a widely used library for point cloud mapping. We replicate the incremental method and demonstrate the efficiency of it. In addition, we examined the effect of the proposed non-uniform method on the map generation time.

Real-Time 3D Object Detection and Tracking

Real-Time 3D Object Detection and Tracking is required for autonomous driving car and real-time systems. By detecting and tracking obstacles in 3D space, Robots can do trajectory exploration and action judgement safely. The purpose is to create high performance and accuracy algorithms by using end-to-end deep neural networks and GPU.

Fusion-typed CNN

Parallelization of Convolutional Neural Networks (CNNs) has been considerably studied in recent years. A case study of parallelized CNNs using general-purpose computing on GPUs (GPGPU) and Message Passing Interface (MPI) has been published. On the other hand, little effort is being expended on studying scalability of parallelized CNNs on multi-core CPUs. We explores performance of the training process of CNNs achieved by increasing the number of computing cores and threads. Detailed experiments were conducted on state-of-the-art multi-core processors using OpenMP and MPI frameworks to demonstrate that Caffe-based CNNs are successfully accelerated due to well-designed multi-threaded programs. We also discussed better way to exhibit performance of multi-threaded CNNs by comparing three different implementations.