1. Introduction

ffordability of powerful computational hardware and advances in deep learning techniques has made vision-based autonomous driving an active research focus within the transport industry. There are considerable drawbacks in the techniques to overcome, even though the research worldwide has already taken giant leap. Foremost downside is the inability to explicitly model each possible scenario. Driving needs responding to a large variety of complex environmental conditions and agent behaviors.

End-to-end method and perception-driven method are the two popular vision-based paradigms for self-driving cars. A perception-based method lacks selflearning ability and all features including task plans are manually hand crafted. This is the major disadvantage of perception-based method.

End-to-end Behavior cloning (Off-policy imitation learning) provides an alternative to traditional modular approach by simultaneously learning both perception and control using deep network.

Maneuvering each and every steady on-road obstacle either surpass able or non-surpass-able is cost intensive. Again maneuvering a steady on-road obstacle at high speed involves taking multiple decisions in split seconds. An inaccurate decision may result in crash. One of the key decision that needs to be taken is can the steady on-road obstacle be surpassed.

Clearly, an autonomous driving vehicle successfully navigating through the streets should be able to follow the roadway as well as maneuver only in cases required. If the autonomous vehicle can surpass a steady on-road obstacle without being unstable it must do so. Therefore, we here in propose an improved convolution neural network model. Overall, this research work makes the following contributions:

? Provides evaluation method for popular learning models by defining test cases to mitigate an onroad obstacle. ? Identify, evaluate and validate the configuration producing optimum results by performing combination of activation function and dropout. ? Improve prediction accuracy by validating statistical data with visual saliency map.

The paper has been organized as follows: Section 2 gives an overview of related work done in past and present. Section 3 describes the methodology used. Section 4 Experimental Setup, Results and Discussion. Finally, Section 5 concludes this paper.

2. II.

3. Related Work

Perception-driven method ad End-to-end method are the two popular vision-based paradigms for self-driving cars. Both the perception and end-to-end methods have been reviewed extensive through literature study and presented here. However, the key aspect of autonomous driving is the problem of object detection itself. Hence in the later part of this section we review our study on object detection in the context of Deep Learning.

4. a) Perception-driven Learning Methods

Traditional Perception based method has made remarkable achievements, in past decades, in the field of self-driving cars. Several methods of detections have been proposed to generate a description of the local environment. Depending on the technique used current detection research can be broadly classified as shown in Figure 1 below.

5. i. Vision Based Detection

Object location through enclosing boxes has been adopted by researchers as a part of object detection task. This bounding box can be anything from a steady road signs, traffic signs, moving cars, moving bicycles, etc. The model is trained with labeled data of objects. The key factor in case of self-driving car is an ability to identify if there is an obstacle and at what distance. It is immaterial to have the exact location of the bounding box.

Kwang Eun An [10] used Deep Convolutional Neural Network (DCNN) for image classification into obstacle/pothole or non-pothole. The method compared Inception_v4, Inception_ResNet_v2, ResNet_v2_152 and MobileNet_v1 for comparison between color and grayscale images. This method is limited in its efficiency in processing single frame of image.

Canny edge detection or Hough transformation area is implemented to locate lane position through many lane detection methods. There are no specific geometric restrictions required to identify uneven lane boundaries in these methods.

6. ii. 3D Reconstruction

There are three major approaches for 3D reconstruction of obstacle/pothole each with its own drawback. 1. Chang et al. [19] used a Grid based processing technique. Here a surface receives laser incidents and digitally implements the bounced back pulse to generate a precise. The output was accurate, however this was expensive. Li et al. [20] used infrared laser line projectors, a digital camera and a multi-view coplanar scheme for calibrating the lasers. The method plotted more feature points in the cameras point of view and was much more cost effective. 2. Wang [21] used a series of cameras. This method generated a 3D surface model through a series of 2D images captured. High computation requirement was key backdrop. 3. Xbox Kinect sensor was used by Joubert et al. [22] and Moazzam et al. [23]. The method could not minimize the error and power for computing, even though equipment price was minimized.

iii. Vibration based detection Umang Bhatt [26] combined accelerometer, gyroscope, location and speed data to classify road condition and detect potholes/obstacles. SVM with radial basic function (RBF) kernel was used for road condition classification.SVM and gradient boost were used for pothole/non-pothole classification. Failure attributed includes inability to accurately classify between good and bad road due to insufficient data for all road types. The key backdrop was inability to distinguish between a bump, a manhole or a pothole.

One key takeaway from all the above work done in perception based learning provides proactive data to the driver regarding the obstacle. These methods do not provide any method for routing through obstacle. Again, there is not classification if an obstacle can be surpassed or not.

7. b) End-to-end Learning Methods

Pomerleau [9] pioneered end-to-end training of neural network to steer a car. In 1989 Pomerleau built the Autonomous Land Vehicle in a Neural Network (ALVINN) system.

In the scenario of autonomous driving one of the key requirement is an ability to identify salient objects. The key differentiator for this architecture was its ability to identify salient objects without the need for hand-coded rules and instead learn by observing. However Dave's PilotNet model admits that the convolutional layers were chosen empirically and hence the performance was not sufficiently reliable to provide a full alternative to more modular approaches to off-road driving.

ii. Deconvolution Based Matthew [28] presented a method for mid and high-level feature learning like corners, junctions and object parts. Mattew [28] resolves the two fundamental problems found in image descriptors like SIFT, HOG or edge gradient calculators followed by some histogram or pooling operation. First being invariance to orientation and scale. Secondly a CNN models inefficiency in training each model with respect to input. Visualization available is for one activation per layer.

8. iii. Layer-wise Relevance Propagation

Alexander Binder [29] implemented Layer-wise relevance propagation to compute scores for image pixels and image regions denoting the impact of the particular image region on the prediction of the classifier for one particular test image. Alexander Binder [29] demonstrated controlling the noisiness of the heatmap, however an optimal trade between numerical stability and sparsity/meaningfulness of the heatmap was kept as item for future work.

Salient object based methods does identify the key features that impact the steering angles. However these methods have not been explicitly developed/tested for identifying if an obstacle can be surpassed or not.

To overcome the above mentioned limitations, we propose to perform extensive training and testing for a neural network to clone an obstacle mitigation policy. Even though there is a great deal of work and literature on the task of steering angle prediction, our goal is not to propose yet another method for prediction, but rather provide a different perspective on on-road steady obstacles mitigation model. A model to not only detect an on-road steady obstacle but also to predict the obstacle can be surpassed to avoid unnecessary maneuvering.

9. III.

10. Methodology

This section provides the details on CNN models used for validation and steps performed on data for accurate prediction.

11. a) Preprocessing

Network Model continuously predicts the steering angle to clear all test cases with an input of raw pixels incorporating attention in an end-to-end manner. It's important that our experiments and results are independent of car geometry, hence we represent steering command as inverse turning radius ?? ?? ?1 (r is turning radius at time stamp t). We use inverse to prevent numerical instability, singularity, and smooth transitions through zero from left turns to right turns. The relation between steering angle ?? ?? and inverse turning radius can be given as

?? ?? = ð??"ð??" ???????????? (?? ?? ) = ?? ?? ?? ?? ?? ?? (1 + ?? ???????? ?? ?? 2 . (1)

Where ?? ?? in degree and ?? ?? (m/s) is a steering angle and velocity at time ?? , respectively. ?? ?? , ?? ???????? and ?? ?? are vehicle-specific parameters.?? ?? is steering ratio between the turn of the steering and the turn of the wheels.?? ???????? represents the relative motion between the front and rear wheels.

b) Data Bias Removal The general tendency of every driver is to drive as steady as possible without lot of maneuvering. However in case of behavioral cloning if the driver drives steady during training then all the model will learn is to maintain a zero steering angle. Such a training would generate results bias to zero output. In order to avoid output bias towards a -ve/+ve steering angle driving the training is done for complete track in clockwise and then anticlockwise direction over track. In fact additional recovery training tracks are also done where the car is take off the center lane and then recovered to back. The data bias is removed by trimming samples per bin as shown in Figure 3.

12. Experimental Setup, Results and Discussion

This section presents the basic description of experimental setup for data collection, training and testing. We elaborate further on the configuration of hardware and software used. Later we enlist the training cases and test cases for model evaluation and the evaluation criteria. Eventually we enlist the results from our experiment.

Model training and testing is performed in a virtual environment Unity 2017 in the interest of research cost. Other software tools include Visual studio for Unity, Atom, Jupyter Notebook and Anaconda. GitHub for online code repository and Google Colab platform for online code execution. Programming is done in c# and python. Multiple packages including OpenCV, numpy, matplotlib, Keras (model, optimizer, layers), pandas and sklearn are used. Hardware includes my laptop with Intel i5 [email protected], 8GB RAM and Intel HD Graphics family with 2GB RAM.

Virtual 3D models of non-surpass-able and surpass-able obstacles are created in unity as shown in Figure 4 below. an array of multiple surpass-able obstacles. The model is supposed to pass through the obstacles with minimum vehicle instability and maintain a steady drive to call it has passes this test case. Starting Bottom Left 1 st :TC06: The model is supposed to maneuver through an array of left and right non-surpass-able obstacles and retain a steady drive to call it has passes this test case. 2 nd :TC07: The track has an array of both multiple left and right non-surpassable and surpass-able obstacles. The model is supposed to pass through surpass-able obstacles and maneuver through the obstacle to call it pass. 3 rd :TC08: In this case an unknown bridge, which has a different track, has to be crossed to call this case passes. 4 th :TC09: In this case an unknown unpaved path has to be avoided by the model. The model has not been trained for this behavior. 5 th :TC10: In model is expected to clear all the obstacles, pass through unknown bridge and unpaved path all in a single stretch.

13. b) Model Accuracy Calculation

Each model is trained in 4 cases and tested for 10 cases. This paper represents two deep learning models tested with a combination of activation function and dropout on same database. Table 2 below shows model used, code assigned for ease, configurations used and val_loss achieved. are 50% and 30% respectively with heavy processing time of 12000s and 8280s respectively. Model P1 achieves the highest accuracy consuming the processing time of 4478 seconds.

Model P1, PilotNet with elu and no-dropout, performed a self-recovery in test cases 6 and 7 as listed in Table 5 below

14. Conclusion

In this paper, we presented and compared two most popular autonomous driving methods including DroNet and PilotNet. We experimented with combinations of different activation functions withwithout dropout. The experiment has demonstrated the PilotNet model P1 is able to learn the entire task of nonsurpass-able obstacle maneuvering and passing through a surpass-able obstacle. The experiment has provided us with a clear insight into effect of each activation function and dropout on steering angle prediction. PilotNet model P1 has the highest prediction accuracy, lowest val_loss and reasonable processing time with best visual saliency map for obstacles with current dataset. The experiment has clearly concluded that PilotNet, with activation function elu without dropout, outperforms all other models and configurations.

The system learned to mitigate through an obstacle without the need of explicit surpass-able and non-surpass-able obstacle labeling during training.

In the future work, we would like to optimize PilotNet to improve prediction accuracy. We would also like to introduce a custom-Net that would outperform all current autonomous driving methods.

Fig. 1: Perception-driven learning — Figure 1. Fig. 1 :Fig. 2 :

Fig. 3: Left: Data with 5000+ images having '0' steering Right: Non-biased data with sample reduced to 400+ per bin to have a uniformly distributed steering. — Figure 2. Fig. 3 :

c) Encoder model: Convolution to clone obstacle mitigation behavior Convolutional Neural Networks (CNN) are adaptations of multi-layer perceptron and are biologically-inspired. In context of self-driving cars, CNNs are able to learn the entire task of lane and road following without manual decomposition into road or lane marking detection, semantic abstraction, path planning and control. The network learns to detect the outline of a road without the need of explicit labels during training. The following research implements variants of the CNN architecture established by NVIDIA for self-driving cars, the PilotNet[3] and DroNet[32].IV. — Figure 3.

Fig. 4: Left: Car model alongside non-surpass-able obstacle. Right: Car model alongside surpass-able obstacle. — Figure 4. Fig. 4 :

Fig. 5: Left: Front camera to capture images. Right: Steering angle and car driving direction a) Training and Test cases Our network model is trained on only 4 cases and is expected to pass 10 test cases based on training . These training cases are to train the network to clone driver's behavior while driving through both nonsurpass-able and surpass-able obstacles. Training cases are as listed in Figure 6 below — Figure 5. Fig. 5 :

Fig. 6: Top Left: Training Case-TRC01: The track is empty without any obstacles. The model is trained on this empty track. Top Right: TRC02: The track has a non-surpass-able obstacle on the right of simulated car. The model it trained to maneuver through the obstacle. Bottom Left: TRC03: The track has a surpass-able obstacle in the centre of track. The model it trained to pass through the obstacle. Bottom Right: TRC04: The track has a non-surpass-able — Figure 6. Fig. 6 :

Fig. 7: Starting from Top Left 1 st : Test Case-TC01: The track is empty without any obstacles. The model is supposed to drive through the entire track to call it has passes this test case. 2 nd :TC02: The track has a non-surpass-able obstacle on the right of simulated car. The model is supposed to maneuver through the obstacle and retain a steady drive to call it has passes this test case. 3 rd :TC03: The track has a surpass-able obstacle. The model is supposed to pass through the obstacle with minimum vehicle instability and maintain a steady drive to call it has passes this test case. 4 th :TC04: The track has a non-surpass-able obstacle on the left of simulated car. The model is supposed to maneuver through the obstacle and retain a steady drive to call it has passes this test case. 5 th :TC05: The track has — Figure 7. Fig. 7 :

Fig. 8: Left: Prediction Accuracy. Right: Processing time — Figure 8. Fig. 8 :

Figure 8 Left and right shows the prediction accuracy and model running time when using different configurations on two learning models using same dataset. Model P1, PilotNet with elu and no dropout, achieves highest prediction accuracy of 87.33%. Model P4 and P5 has the least accuracy of 10%. The highest accuracy achieved by both DroNet model D1 and D2 — Figure 9. Figure 8

Path Followed and individual test case accuracy for Model P1 1 st , skips 2 nd by going off the track and recovers back post skipping 3 rd obstacle. The model maneuvers 1 st , pass through 2 nd , skips 3 rd by going off the track and recovers back post skipping 4 th and 5 th obstacle. SR calculation SR o = 1; T o = 3; SR 6 = 0.33 SR o = 2; T o = 5; SR 7 = 0.4Overall Model Accuracy for model P1 is calculated as below MA P1 = ((P P1 +SR P1 )/ T)/100 = ((8 + (0.33+0.4))/10)/100 = 87.33% Saliency Map is generated to co-related and validate the statistical results obtained. PilotNet models are taken for saliency mapping for identification of left non-surpass-able, surpass-able and right non-surpassable obstacles. As depicted in table below model P4 and P5 has failed completely which co-relates with only prediction accuracy of 10%. Model P2 has wrongly detected the saliency for surpass-able obstacle and has not detected the lane boundaries at all. Model P1 perform better than P3 with better saliency map both for obstacles and lanes as depicted in table6 — Figure 10. Table 3 :

Saliency Map for PilotNet based Models Model Non-surpass-able left(alpha=0.004) Activation Function: Key to Cloning from Human Learning to Deep Learning V. — Figure 11. Table 4 :

Figure 12. Table 1 :

			c) Dataset Characteristics
			Right size of data set is the key to accurate
			predictable solution. Initial training was started with
Number of test cases model passed: P Score for a test cases in which model self-recovered: SR Model Accuracy: MA Total test cases: T Number of obstacles correctly maneuvered or surpassed: SR o Total maneuverable and surpass-able obstacles: T o MA = ((P+SR)/ T)/100 (2)			30,000 images, however due to resource constraints, we reached an optimal data size of 9970 images that produced reliable results. Zero steering bias images are removed from input images. Data augmentation is performed to increase the data size and accuracy. The final data set is classified using test_train_split functionality in sklearn library. All models are trained using 3904 samples and validated with 977 samples as shown in Table 1 below
SR = SR o /T o		(3)
Model	Input Data	Input Data after removing bias Training Samples Validation Samples		Trainable parameters
DroNet	9930	3904	977	311777
PilotNet	9930	3904	977	252219

Figure 13. Table 2

below clearly

Figure 14. Table 2 :

Appendix A

Real time lane detection for autonomous vehicles. A A Assidiq , O O Khalifa , M R Islam , S Khan . ICCCE 2008. International Conference on, 2008. 2008. IEEE. p. . (Computer and Communication Engineering)
ALVINN, an autonomous land vehicle in a neural network, A Dean , Pomerleau . 1989. Carnegie Mellon University (Technical report)
Layer-Wise Relevance Propagation for Deep Neural Network Architectures, Alexander & Binder , Lapuschkin , & Sebastian , Montavon , & Gregoire , Klaus-Robert & Müller , Wojciech Samek . 913-922.10.1007/978-981-10-0557-2_87. 2016.
Imagenet classi fication with deep convolutional neural networks. Alex Krizhevsky , Ilya Sutskever , Geoffrey E Hinton . http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf Advances in Neural Information Processing Systems, F Pereira, C J C Burges, L Bottou, K Q Weinberger (ed.) 2012. Curran Associates, Inc. 25 p. .
Survey on Vision based Hand Gesture Recognition. A Loquercio , A I Maqueda , C R Blanco , D Scaramuzza ; Pranit , Krishna Shah , Harsh Pandya , Jay Shah , Gandhi . IEEE Robotics and Automation Letters, April 2018. 2019. 3 p. . (DroNet: Learning to Fly by Driving)
Detecting a pothole using deep convolutional neural network models for an adaptive shock observing in a vehicle, An , Lee , & Sung , Ryu , Dongmahn Seo . driving.1-2.10.1109/ICCE.2018.8326142. 2018.
Autonomous off-road vehicle control using end-to-end learning. http://net-scale.com/doc/net-scale-dave-report.pdf Final technical report July 2004. Net-Scale Technologies, Inc.
An empirical evaluation of deep learning on highway driving, B Huval , T Wang , S Tandon , J Kiske , W Song , J Pazhayampallil , M Andriluka , T Rajpurkar , R Migimatsu , Cheng-Yue . arXiv:1504.01716. 2015. (arXiv preprint)
Deepdriving: Learning affordance for direct perception in autonomous driving. C Chen , A Seff , A Kornhauser , J Xiao . Proceedings of the IEEE International Conference on Computer Vision, (the IEEE International Conference on Computer Vision) 2015. p. .
Pothole tagging system. D Joubert , A Tyatyantsi , J Mphahlehle , V Manchidi . 4th Robotics and Mechatronics Conference of South Africa, 2011. p. .
Pothole Detection with Image Processing and Spectral Clustering. E Buza , S Omanovic , A Huseinovic . 2nd International Conference on Information Technology and Computer Networks, 2013. p. .
Behavioural cloning: phenomena, results and problems. Ivan Bratko , Tanja Urbani , Claude Sammut . IFAC Proceedings 1995. 28 (21) p. .
A lane detection method for lane departure warning system. J He , H Rong , J Gong , W Huang . Optoelectronics and Image Processing (ICOIP), 2010 International Conference on, 2010. IEEE. 1 p. .
You only look once: Unified, real-time object detection. J Redmon , S Divvala , R Girshick , A Farhadi . Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (the IEEE Conference on Computer Vision and Pattern Recognition) 2016. p. .
Challenges and feasibility for comprehensive automated survey of pavement conditions. K C P Wang . 8th International Conference on Applications of Advanced Technologies in Transportation Engineering, 2004. p. .
Metrology and visualization of potholes using the Microsoft Kinect sensor. K Moazzam , S Kamal , S Mathavan , M Usman , Rahman . 16th International IEEE Annual Conference on Intelligent Transportation Systems, 2013. p. .
, Mariusz & Bojarski , Davide & Testa , Dworakowski , & Daniel , Bernhard & Firner , Flepp , & Beat , Goyal , & Prasoon , Jackel , & Lawrence , Monfort , & Mathew , Muller , & Urs , Zhang , & Jiakai , Zhang , & Xin , Jake & Zhao , Karol Zieba . 2016. (End to End Learning for Self-Driving Cars)
Explaining how a deep neural network trained with end-to-end learning steers a car, Mariusz Bojarski , Philipyeres , Krzyszt Annachoromanska , Bernhard Of Choromanski , Lawrence Firner , Urs Jackel , Muller . arXiv:1704.07911. 2017. (arXiv preprint)
Survey on Vision based Hand Gesture Recognition. Pranit & Shah , Krishna & Pandya , Shah , Jay Gandhi . 10.26438/ijcse/v7i5.281288. 7.281-288.10.26438/ijcse/v7i5.281288 International Journal of Computer Sciences and Engineering 2019.
A real-time 3D Scanning System for pavement distortion inspection, measurement science and technology, Q Li , M Yao , Yao , B Xu . 2009. p. .
Region-based convolutional networks for accurate object detection and segmentation. R Girshick , J Donahue , T Darrell , J Malik . IEEE transactions on pattern analysis and machine intelligence, 2016. 38 p. .
Towards Behavioural Cloning for Autonomous Driving. Saumya Kumaar , Navaneeth Krishnan , Hegde , Sinchana , Pragadeesh Raja , Ravi M Vishwanath . IEEE INTERNATIONAL CONFERENCE ON ROBOTIC COMPUTING (IRC 2019), (Naples, Italy
) 2019. Feb.2019. p. .
Analysis and Improvements on Current Pothole Detection Techniques. 1-4.10.1109/ ICSCEE, Sumit & Srivastava , Sharma , Balot , Harsh . 2018. 2018. p. 8538390.
Intelligent Pothole Detection and Road Cond0ition Assessment, Umang & Bhatt , Mani , & Shouvik , Edgar & Xi , J Kolter . 2017.
Wenguan & Wang , Lai , & Qiuxia , Fu , & Huazhu , Shen , Haibin Ling . Salient Object Detection in the Deep Learning Era: An In-Depth Survey, 2019.
Multiple lane boundary detection using a combination of low-level image features. Y Li , A Iqbal , N R Gans . Intelligent Transportation Systems (ITSC), 2014. 2014. IEEE. p. . (IEEE 17th International Conference on)
Lane detection and tracking using b-snake. Y Wang , E K Teoh , D Shen . Image and Vision computing 2004. 22 (4) p. .
Object Detection With Deep Learning: A Review. Zhong-Qiu & Zhao , Zheng , & Peng , Shou-Tao & Xu , Xindong Wu . 10.1109/ TNNLS.2018.2876865. IEEE Transactions on Neural Networks and Learning Systems.PP 2019. p. .
Experimentation of 3D pavement imaging through stereovision. Z Hou , K C P Wang , W Gong . International Conference on Transportation Engineering, 2007. p. .
Ef ficient lane boundary detection with spatial-temporal knowledge filtering. Z Nan , P Wei , L Xu , N Zheng . Sensors 2016. 16 (8) p. 1276.

Activation Function: Key to Cloning from Human Learning to Deep Learning

Table of contents

Appendix A