CVPR 2019 | Baidu Unmanned Vehicle Achieves the World’s First Deep Learning-Based Laser Point Cloud Self-Location Technology
Several researchers and engineers from Baidu’s Intelligent Driving Group (Baidu IDG) have realized the world’s first deep learning-based laser point cloud self-positioning technology. The learning network achieves a high-precision centimeter-level self-localization effect, which is an important technological breakthrough. The related technical papers of the system are included in the international summit CVPR 2019 held in Long Beach, Los Angeles, USA in June this year.
As one of the most “hard-core” landing directions of artificial intelligence, autonomous driving has attracted widespread attention in academia and industry. The realization of autonomous driving will not only bring about fundamental changes in the form of automobile products and subvert the traditional automobile technology system and industrial structure, but also lead to changes in consumer travel and lifestyle, information technology and communication methods, and information and transportation infrastructure. However, proportional to the high return is the difficulty of realization. A complete autonomous driving system includes modules such as Localization, AD Map, Perception, Decision and Planning, and Control.
High-precision self-positioning is one of the necessary technologies to realize autonomous driving. At present, the mainstream L4 and L5 autonomous vehicles are highly dependent on a self-positioning system with high precision, high reliability and strong scene adaptability. This system provides high-precision centimeter-level positioning capabilities for the automatic driving system. With the automatic driving map, the unmanned vehicle can read the environmental information in the pre-existing map according to its position in the environment. These pre-existing maps The data in it contains the environmental information required by various unmanned vehicles, including traffic lights, crosswalks, lane lines, road boundaries, parking spaces, etc. With this information, unmanned vehicles cleverly avoid the need to achieve a high-accuracy online environment. The problem of the information perception system greatly reduces the technical difficulty of unmanned vehicles, making the impossible possible.
At the same time, as one of the core modules of the unmanned vehicle, once there is a problem with the positioning system, it will cause catastrophic accidents such as the unmanned vehicle running off the road and hitting the road shoulder. In this paper, technical experts from Baidu proposed a new set of laser point cloud self-positioning technology solutions based on deep learning, which achieved centimeter-level positioning accuracy and better adaptability to environmental changes.
Abstract: Baidu proposes a learning-based point cloud positioning technology, which disassembles the traditional positioning method and uses a deep learning network to replace the traditional links and steps. This solution is the first time in the industry to solve the self-localization problem of autonomous driving by using a deep learning network directly acting on the laser point cloud.
In order to verify the effectiveness of the algorithm, Baidu plans to open a set of data sets totaling about 380km on the Apollo platform. The data set includes three subsets of mapping, training and testing, covering urban roads, inner park roads and highways. The maximum time span between scene, map and test set is up to one year. On this dataset, Baidu verified the advantages of its algorithm over traditional methods. At present, the research paper has been included in CVPR 2019, the top conference in the field of computer vision.
As we all know, deep learning technology has achieved remarkable results in various fields of artificial intelligence in recent years. AlphaGo, which defeated the human champion Go player, has made the general public deeply feel the power brought by the breakthrough of AI technology. However, at the same time, we found that the problems that deep learning can solve well at present are mainly concentrated on problems that need to be understood, analyzed, and judged by humans based on experience. For example, in the field of computer vision, deep learning has achieved very good results in solving problems such as image segmentation, image classification, and object detection. However, for another major category of important problems, such as geometric problems related to measurement and ranging, 3D reconstruction, etc., although some work has made some progress, in general, deep learning has not made a decisive breakthrough in related fields. The self-localization problem of unmanned vehicles is a typical representative of such problems. At present, no matter from universities to industry giants, the self-localization technology of unmanned vehicles of various players has not yet successfully applied deep learning technology. However, historical experience tells us that once a learning-based technology achieves a breakthrough in solving a certain artificial intelligence problem, the torrent of such technological evolution will usually rapidly surpass the traditional artificial design in various performance index dimensions with an unstoppable trend. The algorithm has become the new industry technical standard.
As one of the basic modules of autonomous driving, positioning has always been a hot research topic. The existing traditional laser point cloud positioning technology is shown in the upper part of Figure 1, which includes modules such as feature extraction, feature matching and timing optimization. The entire algorithm The input includes real-time online laser point cloud, localization map, and initial predicted localization position and attitude from inertial sensors, and the final output is the pose result after optimization of the localization algorithm. The idea of the overall plan is actually highly similar to the way humans recognize the way. We usually judge our own position through some typical landmark buildings. The difference is that the positioning result of the unmanned vehicle requires centimeter-level position accuracy and sub-angle-level attitude accuracy to ensure that the unmanned vehicle can accurately extract the necessary information from the autonomous driving map. Although the above solutions have achieved the best localization effect at present, such artificially designed algorithms are very sensitive to changes in the environment when performing feature extraction and matching. In a constantly dynamically changing environment, it is impossible to intelligently capture invariant information in the environment (for example, landmarks, street signs, etc.) to achieve high-precision and high-robust estimation of its own position, depending on the severity of environmental changes. In the application, the positioning map needs to be updated frequently, which will increase the cost.
The solution proposed by Baidu is shown in the lower part of Figure 1. By transforming each process in the traditional method with different types of network structures, it realizes the pioneering laser self-positioning technology for unmanned vehicles based on deep learning technology: L3 -Net.
Figure 1 Comparison of traditional method and L3-Net technical process. L3-Net uses PointNet network for feature extraction, 3D CNNs network for feature matching and optimization, and finally RNNs network for smoothing the time series.
According to the content of the paper, the advanced nature of the technical solution proposed by Baidu is concentrated in the following aspects:
The industry’s first self-localization technology solution for autonomous driving based on deep learning is proposed, which can accurately estimate the position and attitude of the vehicle, and achieve centimeter-level precision positioning.
Using different network structures to disassemble and replace the technical process of traditional laser point cloud localization and concatenate them for unified training enables the network to complete the online laser point cloud localization task in an end-to-end manner.
A set of general data sets with a total length of 380km including urban roads, park roads, highways and other complex scenes will be released for testing of similar algorithms, which will further enrich the open content of Baidu’s Apollo open platform.
The deep learning-based laser positioning system proposed by Baidu requires a pre-established laser point cloud positioning map, an online laser point cloud, and the predicted pose from an inertial sensor as input. The pre-built laser point cloud localization map is obtained by fusing the point cloud data of the same area collected multiple times through an offline mapping algorithm, and using a point cloud recognition algorithm to eliminate dynamic objects. The online point cloud is collected by the lidar device installed on the autonomous vehicle during the driving process, and the predicted pose is recursively derived from the positioning result of the previous frame plus the incremental motion estimated by the inertial sensor or the vehicle motion model. In general, this localization framework optimizes the predicted pose by minimizing the matching distance between the online point cloud and the map to obtain localization results. Generally speaking, the unmanned vehicle needs the positioning module to output six degrees of freedom poses, including translation (∆x, ∆y, ∆z) in the directions of the three coordinate axes (x, y, z) and three coordinate axes. The rotation (pitch, roll, and heading) of , but since inertial sensors can usually provide relatively accurate pitch and roll information, and when (x,y) estimates are accurate, the elevation z can usually be obtained from the map. Therefore, in the current mainstream self-positioning system design, only 2D horizontal translation (Δx, Δy) and heading angle are generally estimated, and L3-Net also adopts a similar design.
Figure 2 The learning-based laser self-localization network architecture L3-Net proposed by Baidu. The network is trained in two stages. The first stage of training only includes the black arrow part, including keypoint selection, feature extraction, and feature matching based on 3D CNNs; the second stage of training includes the cyan arrow part. This stage of training Contains a network of RNNs for temporal smoothing.
Specifically, the L3-Net algorithm flow proposed by Baidu is shown in Figure 2. For each frame of online point cloud, it is necessary to find a series of key points, and collect local point cloud blocks centered on the key points to extract feature descriptors. The extraction of keypoints needs to consider both local and global geometric relationship. L3-Net first uses the density of the point cloud to find some candidate points. Then, the probability of linearity and randomness is estimated for each candidate point using the classical point cloud 3D structural features, and finally the key points are selected comprehensively considering the distance between the candidate points and their structural characteristics. For each key point, the method collects the point cloud information in its local range, and then obtains the feature descriptor through the mini-PointNet network structure. Among them, PointNet is a deep learning network structure that directly acts on unordered point clouds included in CVPR in 2017. L3-Net The mini-PointNet used here is its simplified version, which consists of a multi-layer perceptron (Multi-Layer Perceptron, MLP) and It is composed of a Max-Pooling layer, which is also the first attempt to apply the network structure directly acting on the disordered point cloud to the task of high-precision laser point cloud localization/matching.
After obtaining the feature descriptor of the key point, it is necessary to solve the 2D horizontal position (Δx, Δy) and heading angle results, which is equivalent to solving the deviation of the predicted pose and the true value on the horizontal position and heading angle. shift. In response to this problem, L3-Net uses a search method to discretize the (∆x, ∆y, ∆yaw) three-dimensional state space centered on the predicted pose, and take the positioning states within a certain range to form a set. For a key point in the online point cloud, the cost volume can be obtained by calculating the matching effect of the online point cloud and the map in each positioning state in the set. Then use 3D CNNs to regularize the cost volume in order to suppress the outliers and improve the matching effect. After regularization, L3-Net adds the cost volumes of all key points and obtains the (∆x, ∆y, ∆yaw) positioning space probability volume through the softmax layer, and then estimates (∆x, ∆y, ∆yaw) positioning results.
After obtaining the localization results of each frame of point cloud, L3-Net models the vehicle’s motion model through the LSTM network, and uses the time sequence relationship of localization to improve the localization results. Experiments show that smoother and more accurate localization results are obtained.
Figure 3. The results of the deep learning-based L3-Net laser point cloud localization system compared with other systems.
Figure 4. Visualization of the output of each stage of the L3-Net localization network. Each column in the Cost Volume column represents the matching situation of a key point, each row represents a heading angle state, and each picture represents the cost distribution of the horizontal position. After merging the cost volumes of all keypoints together, we can see that the matching response is significantly enhanced. The final estimated localization results (0.538m, 0.993m, 1.001 degrees) and their corresponding ground truth values from the dataset (0.524m, 0.994m, 1.044 degrees) are shown in the far right column.
In response to the self-localization problem in autonomous driving, Baidu proposed a set of laser point cloud self-localization algorithms based on deep learning. Baidu uses different types of network structures to replace each functional module in the traditional method, and verifies the algorithm effect on a dataset containing various road conditions and large-scale time spans, achieving centimeter-level positioning accuracy. The dataset contains a variety of challenging road conditions, such as urban roads, park roads and high-speed roads, with a total data mileage of 380km. It will be opened on the Baidu Apollo platform soon.