ffordability of powerful computational hardware and advances in deep learning techniques has made vision-based autonomous driving an active research focus within the transport industry. There are considerable drawbacks in the techniques to overcome, even though the research worldwide has already taken giant leap. Foremost downside is the inability to explicitly model each possible scenario. Driving needs responding to a large variety of complex environmental conditions and agent behaviors.
End-to-end method and perception-driven method are the two popular vision-based paradigms for self-driving cars. A perception-based method lacks selflearning ability and all features including task plans are manually hand crafted. This is the major disadvantage of perception-based method.
End-to-end Behavior cloning (Off-policy imitation learning) provides an alternative to traditional modular approach by simultaneously learning both perception and control using deep network.
Maneuvering each and every steady on-road obstacle either surpass able or non-surpass-able is cost intensive. Again maneuvering a steady on-road obstacle at high speed involves taking multiple decisions in split seconds. An inaccurate decision may result in crash. One of the key decision that needs to be taken is can the steady on-road obstacle be surpassed.
Clearly, an autonomous driving vehicle successfully navigating through the streets should be able to follow the roadway as well as maneuver only in cases required. If the autonomous vehicle can surpass a steady on-road obstacle without being unstable it must do so. Therefore, we here in propose an improved convolution neural network model. Overall, this research work makes the following contributions:
? Provides evaluation method for popular learning models by defining test cases to mitigate an onroad obstacle. ? Identify, evaluate and validate the configuration producing optimum results by performing combination of activation function and dropout. ? Improve prediction accuracy by validating statistical data with visual saliency map.
The paper has been organized as follows: Section 2 gives an overview of related work done in past and present. Section 3 describes the methodology used. Section 4 Experimental Setup, Results and Discussion. Finally, Section 5 concludes this paper.
Perception-driven method ad End-to-end method are the two popular vision-based paradigms for self-driving cars. Both the perception and end-to-end methods have been reviewed extensive through literature study and presented here. However, the key aspect of autonomous driving is the problem of object detection itself. Hence in the later part of this section we review our study on object detection in the context of Deep Learning.
Traditional Perception based method has made remarkable achievements, in past decades, in the field of self-driving cars. Several methods of detections have been proposed to generate a description of the local environment. Depending on the technique used current detection research can be broadly classified as shown in Figure 1 below.
Object location through enclosing boxes has been adopted by researchers as a part of object detection task. This bounding box can be anything from a steady road signs, traffic signs, moving cars, moving bicycles, etc. The model is trained with labeled data of objects. The key factor in case of self-driving car is an ability to identify if there is an obstacle and at what distance. It is immaterial to have the exact location of the bounding box.
Kwang Eun An [10] used Deep Convolutional Neural Network (DCNN) for image classification into obstacle/pothole or non-pothole. The method compared Inception_v4, Inception_ResNet_v2, ResNet_v2_152 and MobileNet_v1 for comparison between color and grayscale images. This method is limited in its efficiency in processing single frame of image.
Canny edge detection or Hough transformation area is implemented to locate lane position through many lane detection methods. There are no specific geometric restrictions required to identify uneven lane boundaries in these methods.
There are three major approaches for 3D reconstruction of obstacle/pothole each with its own drawback. 1. Chang et al. [19] used a Grid based processing technique. Here a surface receives laser incidents and digitally implements the bounced back pulse to generate a precise. The output was accurate, however this was expensive. Li et al. [20] used infrared laser line projectors, a digital camera and a multi-view coplanar scheme for calibrating the lasers. The method plotted more feature points in the cameras point of view and was much more cost effective. 2. Wang [21] used a series of cameras. This method generated a 3D surface model through a series of 2D images captured. High computation requirement was key backdrop. 3. Xbox Kinect sensor was used by Joubert et al. [22] and Moazzam et al. [23]. The method could not minimize the error and power for computing, even though equipment price was minimized.
iii. Vibration based detection Umang Bhatt [26] combined accelerometer, gyroscope, location and speed data to classify road condition and detect potholes/obstacles. SVM with radial basic function (RBF) kernel was used for road condition classification.SVM and gradient boost were used for pothole/non-pothole classification. Failure attributed includes inability to accurately classify between good and bad road due to insufficient data for all road types. The key backdrop was inability to distinguish between a bump, a manhole or a pothole.
One key takeaway from all the above work done in perception based learning provides proactive data to the driver regarding the obstacle. These methods do not provide any method for routing through obstacle. Again, there is not classification if an obstacle can be surpassed or not.
Pomerleau [9] pioneered end-to-end training of neural network to steer a car. In 1989 Pomerleau built the Autonomous Land Vehicle in a Neural Network (ALVINN) system.
In the scenario of autonomous driving one of the key requirement is an ability to identify salient objects. The key differentiator for this architecture was its ability to identify salient objects without the need for hand-coded rules and instead learn by observing. However Dave's PilotNet model admits that the convolutional layers were chosen empirically and hence the performance was not sufficiently reliable to provide a full alternative to more modular approaches to off-road driving.
ii. Deconvolution Based Matthew [28] presented a method for mid and high-level feature learning like corners, junctions and object parts. Mattew [28] resolves the two fundamental problems found in image descriptors like SIFT, HOG or edge gradient calculators followed by some histogram or pooling operation. First being invariance to orientation and scale. Secondly a CNN models inefficiency in training each model with respect to input. Visualization available is for one activation per layer.
Alexander Binder [29] implemented Layer-wise relevance propagation to compute scores for image pixels and image regions denoting the impact of the particular image region on the prediction of the classifier for one particular test image. Alexander Binder [29] demonstrated controlling the noisiness of the heatmap, however an optimal trade between numerical stability and sparsity/meaningfulness of the heatmap was kept as item for future work.
Salient object based methods does identify the key features that impact the steering angles. However these methods have not been explicitly developed/tested for identifying if an obstacle can be surpassed or not.
To overcome the above mentioned limitations, we propose to perform extensive training and testing for a neural network to clone an obstacle mitigation policy. Even though there is a great deal of work and literature on the task of steering angle prediction, our goal is not to propose yet another method for prediction, but rather provide a different perspective on on-road steady obstacles mitigation model. A model to not only detect an on-road steady obstacle but also to predict the obstacle can be surpassed to avoid unnecessary maneuvering.
This section provides the details on CNN models used for validation and steps performed on data for accurate prediction.
Network Model continuously predicts the steering angle to clear all test cases with an input of raw pixels incorporating attention in an end-to-end manner. It's important that our experiments and results are independent of car geometry, hence we represent steering command as inverse turning radius ?? ?? ?1 (r is turning radius at time stamp t). We use inverse to prevent numerical instability, singularity, and smooth transitions through zero from left turns to right turns. The relation between steering angle ?? ?? and inverse turning radius can be given as
?? ?? = ð??"ð??" ???????????? (?? ?? ) = ?? ?? ?? ?? ?? ?? (1 + ?? ???????? ?? ?? 2 . (1)Where ?? ?? in degree and ?? ?? (m/s) is a steering angle and velocity at time ?? , respectively. ?? ?? , ?? ???????? and ?? ?? are vehicle-specific parameters.?? ?? is steering ratio between the turn of the steering and the turn of the wheels.?? ???????? represents the relative motion between the front and rear wheels.
b) Data Bias Removal The general tendency of every driver is to drive as steady as possible without lot of maneuvering. However in case of behavioral cloning if the driver drives steady during training then all the model will learn is to maintain a zero steering angle. Such a training would generate results bias to zero output. In order to avoid output bias towards a -ve/+ve steering angle driving the training is done for complete track in clockwise and then anticlockwise direction over track. In fact additional recovery training tracks are also done where the car is take off the center lane and then recovered to back. The data bias is removed by trimming samples per bin as shown in Figure 3.
This section presents the basic description of experimental setup for data collection, training and testing. We elaborate further on the configuration of hardware and software used. Later we enlist the training cases and test cases for model evaluation and the evaluation criteria. Eventually we enlist the results from our experiment.
Model training and testing is performed in a virtual environment Unity 2017 in the interest of research cost. Other software tools include Visual studio for Unity, Atom, Jupyter Notebook and Anaconda. GitHub for online code repository and Google Colab platform for online code execution. Programming is done in c# and python. Multiple packages including OpenCV, numpy, matplotlib, Keras (model, optimizer, layers), pandas and sklearn are used. Hardware includes my laptop with Intel i5 [email protected], 8GB RAM and Intel HD Graphics family with 2GB RAM.
Virtual 3D models of non-surpass-able and surpass-able obstacles are created in unity as shown in Figure 4 below. an array of multiple surpass-able obstacles. The model is supposed to pass through the obstacles with minimum vehicle instability and maintain a steady drive to call it has passes this test case. Starting Bottom Left 1 st :TC06: The model is supposed to maneuver through an array of left and right non-surpass-able obstacles and retain a steady drive to call it has passes this test case. 2 nd :TC07: The track has an array of both multiple left and right non-surpassable and surpass-able obstacles. The model is supposed to pass through surpass-able obstacles and maneuver through the obstacle to call it pass. 3 rd :TC08: In this case an unknown bridge, which has a different track, has to be crossed to call this case passes. 4 th :TC09: In this case an unknown unpaved path has to be avoided by the model. The model has not been trained for this behavior. 5 th :TC10: In model is expected to clear all the obstacles, pass through unknown bridge and unpaved path all in a single stretch.
Each model is trained in 4 cases and tested for 10 cases. This paper represents two deep learning models tested with a combination of activation function and dropout on same database. Table 2 below shows model used, code assigned for ease, configurations used and val_loss achieved. are 50% and 30% respectively with heavy processing time of 12000s and 8280s respectively. Model P1 achieves the highest accuracy consuming the processing time of 4478 seconds.
Model P1, PilotNet with elu and no-dropout, performed a self-recovery in test cases 6 and 7 as listed in Table 5 below
In this paper, we presented and compared two most popular autonomous driving methods including DroNet and PilotNet. We experimented with combinations of different activation functions withwithout dropout. The experiment has demonstrated the PilotNet model P1 is able to learn the entire task of nonsurpass-able obstacle maneuvering and passing through a surpass-able obstacle. The experiment has provided us with a clear insight into effect of each activation function and dropout on steering angle prediction. PilotNet model P1 has the highest prediction accuracy, lowest val_loss and reasonable processing time with best visual saliency map for obstacles with current dataset. The experiment has clearly concluded that PilotNet, with activation function elu without dropout, outperforms all other models and configurations.
The system learned to mitigate through an obstacle without the need of explicit surpass-able and non-surpass-able obstacle labeling during training.
In the future work, we would like to optimize PilotNet to improve prediction accuracy. We would also like to introduce a custom-Net that would outperform all current autonomous driving methods.
c) Dataset Characteristics | ||||
Right size of data set is the key to accurate | ||||
predictable solution. Initial training was started with | ||||
Number of test cases model passed: P Score for a test cases in which model self-recovered: SR Model Accuracy: MA Total test cases: T Number of obstacles correctly maneuvered or surpassed: SR o Total maneuverable and surpass-able obstacles: T o MA = ((P+SR)/ T)/100 (2) | 30,000 images, however due to resource constraints, we reached an optimal data size of 9970 images that produced reliable results. Zero steering bias images are removed from input images. Data augmentation is performed to increase the data size and accuracy. The final data set is classified using test_train_split functionality in sklearn library. All models are trained using 3904 samples and validated with 977 samples as shown in Table 1 below | |||
SR = SR o /T o | (3) | |||
Model | Input Data | Input Data after removing bias Training Samples Validation Samples | Trainable parameters | |
DroNet | 9930 | 3904 | 977 | 311777 |
PilotNet | 9930 | 3904 | 977 | 252219 |
below clearly |
Real time lane detection for autonomous vehicles. ICCCE 2008. International Conference on, 2008. 2008. IEEE. p. . (Computer and Communication Engineering)
Imagenet classi fication with deep convolutional neural networks. http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf Advances in Neural Information Processing Systems, F Pereira, C J C Burges, L Bottou, K Q Weinberger (ed.) 2012. Curran Associates, Inc. 25 p. .
Survey on Vision based Hand Gesture Recognition. IEEE Robotics and Automation Letters, April 2018. 2019. 3 p. . (DroNet: Learning to Fly by Driving)
Autonomous off-road vehicle control using end-to-end learning. http://net-scale.com/doc/net-scale-dave-report.pdf Final technical report July 2004. Net-Scale Technologies, Inc.
Deepdriving: Learning affordance for direct perception in autonomous driving. Proceedings of the IEEE International Conference on Computer Vision, (the IEEE International Conference on Computer Vision) 2015. p. .
Pothole tagging system. 4th Robotics and Mechatronics Conference of South Africa, 2011. p. .
Pothole Detection with Image Processing and Spectral Clustering. 2nd International Conference on Information Technology and Computer Networks, 2013. p. .
Behavioural cloning: phenomena, results and problems. IFAC Proceedings 1995. 28 (21) p. .
A lane detection method for lane departure warning system. Optoelectronics and Image Processing (ICOIP), 2010 International Conference on, 2010. IEEE. 1 p. .
You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (the IEEE Conference on Computer Vision and Pattern Recognition) 2016. p. .
Challenges and feasibility for comprehensive automated survey of pavement conditions. 8th International Conference on Applications of Advanced Technologies in Transportation Engineering, 2004. p. .
Metrology and visualization of potholes using the Microsoft Kinect sensor. 16th International IEEE Annual Conference on Intelligent Transportation Systems, 2013. p. .
Survey on Vision based Hand Gesture Recognition. 10.26438/ijcse/v7i5.281288. 7.281-288.10.26438/ijcse/v7i5.281288 International Journal of Computer Sciences and Engineering 2019.
Region-based convolutional networks for accurate object detection and segmentation. IEEE transactions on pattern analysis and machine intelligence, 2016. 38 p. .
Towards Behavioural Cloning for Autonomous Driving. IEEE INTERNATIONAL CONFERENCE ON ROBOTIC COMPUTING (IRC 2019), (Naples, Italy
Multiple lane boundary detection using a combination of low-level image features. Intelligent Transportation Systems (ITSC), 2014. 2014. IEEE. p. . (IEEE 17th International Conference on)
Lane detection and tracking using b-snake. Image and Vision computing 2004. 22 (4) p. .
Object Detection With Deep Learning: A Review. 10.1109/ TNNLS.2018.2876865. IEEE Transactions on Neural Networks and Learning Systems.PP 2019. p. .
Experimentation of 3D pavement imaging through stereovision. International Conference on Transportation Engineering, 2007. p. .
Ef ficient lane boundary detection with spatial-temporal knowledge filtering. Sensors 2016. 16 (8) p. 1276.