It is difficult to explain why a DNN model makes certain prediction, which leads to risks of unexpected behaviors and hardness of analyzing errors.Explanation methods using high-level concepts have been recently proposed. However, application areas of these methods are currently limited. In this work, the author makes use of such high-level human-friendly concepts to provide explanation and prediction of false positives in object detection. Specifically, the presented methods automatically discover high-level concepts that target objects have, and based on these concepts, explain what concepts have caused false positives by using importance scores of the concepts. Moreover, they leverage these importance scores of the concepts to predict false positives by correcting output of the model.
Scan matching is a kind of self-localization method that uses point cloud. In scan matching, robots estimate their positions and poses by overlaying map point cloud and sensor point cloud. In comparison to other methods, scan matching is capable of being used in a variety of environments where it is difficult to take full advantage of other methods. For real-time processing, the sensor point clouds must be downsampled to reduce the execution time. These days, scan matching itself is well investigated, but the effect of downsampling has not attracted much attention. In this research, the author focuses on the downsampling for real-time scan matching and evaluate some existing downsampling methods. The relative error of scan matching and execution time of downsampling are measured. In the evaluation, The Normal Distributions Transform algorithm is used as a scan matching. Besides, the author also presents some improved downsampling methods and their evaluations.
As autonomous vehicles will be emerging in the market, there is an increasing need for a low-cost and light-weight monocular camera solution for environmental perception systems, which integrates 3D localization and multiple object tracking methods.
This thesis presents (i) a 7 degree-of-freedoms (DoF) localization method that extends the Normal Distributions Transform (NDT) algorithm (7-NDT), and (ii) a fast multiple object tracking method that uses the detector feature map, localization result, camera geometry, and instance segmentation (ISCG-Tracker). 7-NDT extends the 6 DoF NDT algorithm with a scale term, and reduces the computational complexity to O(N) as well as memory consumption with minimal loss of accuracy. The ISCG-Tracker integrates the localization result into 2D and 3D IOU matching spaces, and improves the accuracy of IOU matching methods. By utilizing feature maps of a 2D object detector and an attention mechanism with the result of instance segmentation for a similarity calculation, the ISCG-Tracker reduces the computational cost of feature extraction and improve robustness to occlusion.
As automated driving system is increasingly popular, techniques to understand the vehicle’s surroundings by utilizing various sensors such as camera and LiDAR are widely researched. In order to realize them, accurate calibrations between sensors are needed. This paper presents a method for estimating extrinsic transform parameter between camera space and LiDAR space by calculating correspondence between LiDAR points and pseudo 3D points generated from the depth map which is inferred from monocular camera images. An evaluation using a dataset for autonomous driving demonstrates the presented method can automatically estimate these transform parameters without complex setup or any marker, and shows that this method is able to correct the rotation error.
Recently, a lot of 2D and 3D data processing has been performed on the GPU, and the speed has been increased. However, the delay due to data transfer between the CPU and the GPU is a big problem in speeding up. Therefore, by implementingGPU processing that is not suitable for parallelization, there is a possibility that datatransfer between the CPU and the GPU can be reduced and the overall processing speeded up. In this research, in order to evaluate this trade-off, we compared and evaluated he speed of the processing flow between the two GPU implementations and the speed of the processing flow between the CPU implementation and the GPU implementation.
The ORB feature is obtained from the light intensity information and hence is affected awfully by the scattering of light by the rain. Therefore, the accuracy of the localization module using a monocular camera based on the ORB feature is degraded in the rain. In this thesis, a complementary method to the ORB feature is presented, which reduces the impact of rain strokes on the localization performance. In the preprocessing of the system, rain strokes are removed from input images by Yang’s method, reducing the scattering of light. Experimental results using Ritcher’s datasets show that the ORB feature augmented by the presented pre-processing method improves performance for localization of mobile robots and autonomous vehicles in the rain.
3D object detection is becoming more and more significant for emerging autonomous vehicles. Safety decision making and motion planning depend highly on the result of 3D object detection. The most common approach to 3D object detection is to use ”Light Detection And Ranging”, a.k.a., LiDAR, sensors being able to scan spatial features of the surrounding environment. Among existing algorithms, the Sparsely Embedded Convolutional Detection (SECOND) algorithm meets both high accuracy and fast execution for real scenarios, but it only supports a limited class, such as cars. In this thesis, multi-class support for the SECOND algorithm is presented, which enables multiple classes of 3D objects scanned by LiDAR sensors, such as cars and pedestrians, to be detected precisely in real time. This study showed that the presented multi-class support for the SECOND algorithm achieves more accurate 3D object detection without compromising execution speed.
As AI applications such as autonomous driving systems are getting more and more popular, there is an increasing need for the technique of detecting 3D objects. It is not straightforward to detect 3D objects accurately in real time by using existing object detection algorithms that use image data. Although there are also networks using LiDAR data as well as image data, they do not bear autonomous driving in terms of speed. This paper presents a light-weight multi-view neural network using multiple types of sensors such as cameras and LiDARs. An evaluation using a benchmark for autonomous driving systems demonstrated that the presented network performs faster than existing 3D object detection algorithms without degradation of detection accuracy.
The Normal Distributions Transform(NDT) algorithm, a scan matching method using 3D pointcloud map, is an approach to self-localization which is necessary to fully autonomous driving system and autonomous mobile robot. This algorithm is often applied to downsampled input pointcloud from Light Detection and Ranging(LiDAR) for the real-time environment, however speed and accuracy are in the relationship of trade-off. In this paper, we propose fast GPU implementation of the scan matching method, focusing on parallelism of the main computing process of the NDT algorithm. As a way of evaluation, we measured execution time of our GPU implementation by changing a rate of downsampling and GPU architecture, and compared with execution time of the original CPU implementation. This comparison indicated our GPU implementation accelerate the scan matching for data of low compressibility.
In the world of autonomous driving, three dimensional surface reconstruction of point clouds serves an important role, which enables cars to create urban city maps for navigational purposes. In this paper, we implement a sparse and incremental surface reconstruction method using the ball pivoting function provided in Meshlab, a point cloud processing system. Compared to traditional surface reconstruction methods, our surface is sparse, which means the computing cost is low compared to other dense methods. Furthermore the construction is incremental so other applications can utilize the surface simultaneously as soon as each segment of surface has been generated. We evaluate the framework by applying it to a point cloud map of the test field obtained by the LiDAR sensor. The NDT matching algorithm was used to orient each scan of the point clouds. We compare our method with the conventional batch method that creates the surface by meshing all the points at once.
3D scan registration is an important method for localization on mobile devices. The 3D normal distribution transform (3D-NDT) is an efficient algorithm for 3D registration compared to the iterative closest point (ICP). Input point data are captured by 3D laser range finders, the resolutions of which are continuously getting higher. Localization for fast-moving objects such as automobile requires short turnaround time in the order of milliseconds. At the same time, embedded systems are sensitive to their power consumption. There exist CPU and GPU implementations of the algorithm, however, lack of flexibility makes them difficult to coordinate between calculation capability and usability in mobile system. To satisfy these requirements, a hardware implementation of the algorithm using FPGA with appropriate datatype is presented in this dissertation. The result shows the new option for 3D registration, which is also expected to be implemented to ASIC in future work.
Scan registration of multiple range images, often referred to as point cloud, is a major function of Simultaneous Localization and Mapping (SLAM). An issue of concern for this mapping is a computation cost, which becomes more expensive as the scale of mapping increases. Previous work developed an efficient method for updating map data with reduced computational complexity of O(N), where N is the number of input scan points. This paper presents an incremental map updating method for point cloud mapping, which extends the idea of previous work to produce more efficient 3D maps in time O(N). Using our method, each voxel constituting the map are updated non-uniformly according to the information of the voxel. Thus we obtain non-uniform distribution of points so that the calculation time required for updating the voxel is reduced according to the feature of mapping environments. We conduct several experiments using the Point Cloud Library (PCL), a widely used library for point cloud mapping. We replicate the incremental method and demonstrate the efficiency of it. In addition, we examined the effect of the proposed non-uniform method on the map generation time.
Real-Time 3D Object Detection and Tracking is required for autonomous driving car and real-time systems. By detecting and tracking obstacles in 3D space, Robots can do trajectory exploration and action judgement safely. The purpose is to create high performance and accuracy algorithms by using end-to-end deep neural networks and GPU.
Parallelization of Convolutional Neural Networks (CNNs) has been considerably studied in recent years. A case study of parallelized CNNs using general-purpose computing on GPUs (GPGPU) and Message Passing Interface (MPI) has been published. On the other hand, little effort is being expended on studying scalability of parallelized CNNs on multi-core CPUs. We explores performance of the training process of CNNs achieved by increasing the number of computing cores and threads. Detailed experiments were conducted on state-of-the-art multi-core processors using OpenMP and MPI frameworks to demonstrate that Caffe-based CNNs are successfully accelerated due to well-designed multi-threaded programs. We also discussed better way to exhibit performance of multi-threaded CNNs by comparing three different implementations.