Abstract-Mango is one of the most traded fruits in the world.

Therefore, mango production suffers from several pests and diseases which reduce the production and quality of mangoes and their price in the local and international markets. Several solutions for automatic diagnosis of these pests and diseases have been proposed by researchers in the last decade. These solutions are based on Machine Learning (ML) and Deep Learning (DL) algorithms. In recent years, Convolutional Neural Networks (CNNs) have achieved impressive results in image classification and are considered as the leading methods for image classification. However, one of the most significant issues facing mango pests and diseases classification solutions is the lack of availability of large and labeled datasets. Data augmentation is one of solutions that has been successfully reported in the literature. This paper deals with data augmentation techniques namely blur, contrast, flip, noise, zoom and affine transformation to know, on the one hand, the impact of each technique on the performance of a ResNet50 CNN using an initial small dataset, on the other hand, the combination between them which gives the best performance to the DL network. Results show that the best combination classifying mango leaf diseases is 'Contrast & Flip & Affine transformation' which gives to the model a training accuracy of 98.54% and testing accuracy of 97.80% with an f1_score > 0.9.

Keywords: data augmentation, mango, disease, classification, deep learning, resnet50.


# I. introduction

ango or Magnifera Indica L. (scientific name) is a lucrative fruit widely cultivated in tropical countries. It belongs to the family anacardiaceous. Its overall consumption in 2017 was estimated at 50.65 million metric tons [1]. This fruit was in 2021, in terms of quantities exported, the third most traded tropical fruit after pineapple and avocado [2]. Mango fruit is very appreciated because of its richness in nutrients (vitamins A, B, C, K, ...), flavorful pulp and alluring aroma [3,4]. This fruit contributes enormous economic benefits to exporting countries and mango growers.

However, mango production suffers severely from pests and diseases witch lead to a reduction of both quality and quantity. This influence mango price in the international market.

In the last decade, several solutions for automatic diagnosis of these pests and diseases have been proposed by researchers. These solutions are first based on image processing (IP) and machine learning (ML) techniques and finally, in the last five years, on deep learning (DL) algorithms DL based solutions have achieved state-of-the-art performance on Image Net and other benchmark datasets [5]. In recent years, Convolutional Neural Networks (CNNs) have achieved impressive results in image classification and are considered as the leading methods for object detection in computer vision [5,6].

However, one of the biggest issues facing mango pests and diseases identification solutions is the lack of availability of large and labeled datasets [7,8,9,10]. The limited training data inhibits performance of DL based models which need big data on which to train well to avoid overfitting and improves the model's generalization ability. Overfitting happens when the training accuracy is higher than the accuracy on the validation/test set. The generalizability of a model is the difference in performance it exhibits when evaluated on training data (known data) versus test data (unknown data). The use of data augmentation process is one of solutions that has been successfully reported in the literature [1]. This overfitting solution generates a more comprehensive set that minimizes the distance between training and validation sets.

A data augmentation process based on image manipulation is presented in this paper for improving the quality of a small dataset of mango leaves presented in [1]. The specific contributions of the paper include:

? Generate a dataset for every data augmentation strategy except affine transformation. The DL model is trained in each generated dataset to know the impact of each data augmentation technique in the performance of the model. The rest of the paper is organized as follows: Section 2 is an overview of the literature review, Section 3 deals with the data acquisition and data augmentation techniques and the CNN model used, Section 4 presents and discusses the results of the data augmentation techniques. The last section concludes the paper and announces the futures works of the authors.

II.


# Ralated Works

The literature review presented in this paper concerns only data augmentation strategies used for ango pest or diseases classification and mango or other fruits quality grading.

Shorten et al. [11] presented a survey dealing with image data augmentation algorithms such as color space augmentations, geometric transformations, mixing images, kernel filters, random erasing, adversarial training, feature space augmentation, generative adversarial networks (GAN), meta-learning and neural style transfer. They also discussed the application of augmentation methods based on GANs and others characteristics of data augmentation such as curriculum learning, test-time augmentation, resolution impact, and final dataset size. Dandavate et al. [12] applied data augmentation techniques namely rotation, scaling and image translation to a fruit dataset to avoid overfitting and obtain better performances with their simple CNN model. Agastya et al. [13] used VGG-16 and VGG-19 for an automatic batik classification. Applying random rotation in a certain degree, scaling and shearing, they improve the accuracy of their models up to 10%. Bargoti et al. [14] presented a fruit (mangoes, apples, and almonds) detection system using Faster R-CNN. They used image flipping and scaling to improve the performance of their model with an F1-score of > 0,9 achieved for mangoes and apples. Wu et al. [15] investigated several deep learning-based methods for mango quality grading. VGG-16 is found to be the best model for this task. During the training of their models, authors applied, at each epoch, randomly data augmentation strategies such as horizontal or vertical image flipping, rotation, brightness, contrast and zoom in/out. Zang et al. [16] developed a fruit category identification by using a 13-layer CNN and three data augmentation strategies namely noise injection, image rotation and Gamma correction. The final obtained overall accuracy is 94.94%, at least 5 percentage points higher than state-of-the-art approaches. Supekar et al. [17] performed a mango grading system based on ripeness, size, shape and defects. They used K-means clustering for defect segmentation and Random Forest Classifiers. To avoid overfitting with an initial training dataset of 69 images, authors applied image rotation on angle of 90,180 and 270. The final training dataset obtained consists of 522 images which allows their model to obtain an overall accuracy of 88,88%.


# III.


# Methodology and Model a) Data aquisition

The dataset used in this paper is a part of 'MangoLeafBD' dataset produced by Ahmed et al. [18] and downloadable from 'Mendeley Data'' platform (https://data.mendeley.com/datasets/hxsnvwty3r).

MangoLeafBD dataset contains height classes, seven of which correspond to mango leaf diseases and one contains healthy leaves.

In this paper, four diseases namely anthracnose, Gall Midge, Powdery Mildew and Sooty Mold are treated as they are among the most mango leaf diseases treated by researchers during the last five years [19] (Fig. 1andFig.2). The dataset used contains four classes corresponding respectively to these diseases and a class of healthy leaves. There are 500 RGB leaf images of 240x320 pixels in each class making a total of 2,500. Images are in JPG format.


# b) Data augmentation

Data augmentation is a powerful solution against overfitting. It allows a model with a small dataset to become robust and generalizable. There are two categories of data augmentation: the first is based on image manipulations and the second on DL (generative adversarial networks (GANs), feature space augmentations, adversarial training, Neural Style Transfer, Meta Learning Data Augmentation) [11].

This research focuses on the first category because i) the second is generally used to generate synthetics images from quite a large dataset, ii) mango leaf images taken under real-world conditions suffer mainly from the problems of temperature variation, shadowing, overlapping of leaves, and presence of multiple objects. The first category can allow us to generate images in these cases.


# This papers deals with following techniques:

? Noise injection Image noise is a random disturbance in the brightness and color of an image. Noise injection is an effective way to avoid overfitting and improves the test ability of a machine learning model [13]. There are several ways to add noise to an image (e.g. Gaussian noise, Salt and Pepper noise, Speckle noise, ?). Gaussian noise is performed fixing mean parameter to 0 and sigma parameter to 0.05.


# ? Blur

Blurring an image means make it less sharp. Photographic blur occurs with movement in the model or scene relative to the camera, and vice versa. To realize this, Gaussian blur was carried out using a kernel size (5,15).


# ? Contrast and Brightness

The Contrast and Brightness function improves the appearance of an image. Brightness improves the overall clarity of the image and contrast adjusts the difference between the darkest and lightest colors. Contrast parameters used are {0.5;2;2.5} and brightness parameters are {1;4;5}. For each original image, three new images are generated with respectively the following parameters contracts, brightness {c; b}: {0.5; 1}, {2; 4}, {2.5; 5}.


# ? Zoom

Zooming an image means enlarging it in a sense that the details in the picture became more visible and clear. Each image is zoomed three times and from the center using zoom parameters {3;5;7}.


# ? Image flipping

To flip or mirror an image means to turn it horizontally (horizontal flip) or vertically (vertical flip). Flip function generates an image so that the left side becomes the right side or the top becomes the bottom. The images are vertically and horizontally flipped using flip parameter 0 and 1 respectively.


# ? Affine transformation

An affine transformation is, in general a combination of translations, rotations, shears and dilations [12]. It s used to simulate images captured from different camera projections nd positions. Affine transformation is performed using an input matrix (In) of size 2x3 and an output matrix (Out) of the same size. The input matrix corresponds to three points in the input image and the second matrix is their corresponding locations in the output image. In the training dataset, twenty additional images are randomly generated for each image. But after that, the generated images on which there is no part of mango leaf are removed.

Fig. 3 shows an example of a diseased mango leaf (anthracnose) on which all these data augmentation techniques are applied. 


# Number of times Mango Diseases

The data augmentation process (Fig. 4) is carried out as follow:

First step: For each of the above mentioned data augmentation strategies (except affine transformation), a new dataset for training and validation is generated (Fig. 3, Table 2). Images of the original dataset are added to the generated one. This is to know the impact of each data augmentation strategy on the overall performance of the model.


# Second step:

Every strategy (except affine transformation) is combined respectively by the 4 others sequentially to generate new datasets (Table 2).

Final step: Affine transformation is applied to the best combination that gives the best performance to the DL model (Table 3).

The augmentation techniques are carried out using python Open Source Computer Vision Library (OpenCV).    [20]. ResNet won the first place at the ImageNet Large Scale Visual Recognition Challenge (ILSVRC 2015). To preserve knowledge, reduce losses and boost performance during the training phase, ResNet introduced residual connections between layers. A residual connection in a layer means that the output of a layer is convolution of its input plus its input [21]. ReNet50 is used in this research. It consists of 50 layers as it is shown by the Fig. 5.

The model is updated by replacing the number (1000) of nodes of the softmax output layer by 5 (corresponding to the number of treated mango leaf diseases).


# d) Implementation details

The data augmentation process and ResNet50 model are all carried out using respectively, OpenCV and Keras labreries. Model's training parameters used include Adam optimizer with a learning rate of 0.001, binary cross-entropy (loss function) and epochs of 8.

The model is trained on a server with an NVIDIA GPU and 32 GB of RAM.

IV.


# Result and Discussion

The initial small dataset is splitted as follow: 64% for training, 16% for validation and 20% for testing. After randomly splitting the dataset, we have 1,600 images for training, 400 images for validation and 500 images for testing. Results sho that the training accuracy (87.18%) is greater than the testing accuracy (39.34%). So the model overfitted as it is shown by the Fig. 6. Since the dataset is not enough to train robustly the DL model, data augmentation process is carried out. This ask concerns only training and validation data [22]. Test data remains equal to 500 images.

In the first step, after training phase, results show that the DL model overfits on all datasets except 'Original & Contrast' which gives a training and testing accuracy of 90.56% and 86.23% respectively (Table 3, Fig. 7).

In the second step, training the model on the combined datasets yielded the results in Table 4 Finally, affine transformation strategy is applied to 'Contrast & Flip' and 'Flip_Zoom' datasets. Results show that 'Contrast & Flip' gives the best performances with an accuracy of 97.80% and a f1_score> 0.9 (Table 5, Fig. 8, Fig. 9).     This solution can be used to improve the performance of DL models for image classification with small datasets.

Our future work, is to propose a dataset of mango leaf diseases with images captured in mango orchards of a sahelian country like Senegal. Applying this combination as a data augmentation technique to this dataset will allow us to achieve excellent results in mango leaf disease classification using a deep learning model such as ResNet50. Then, this model will be deployed in mobile and web applications to allow mango growers to diagnose diseases in their crops without expert intervention.
![Combination of Data Augmentation Techniques for Mango Leaf Diseases Classification Demba Faye ? , Idy Diop ? , Nalla Mbaye ? & Doudou Dione ?](image-2.png "A")
12![Fig.1: Ranking of the most common and treated mango diseases[19] ](image-3.png "Fig. 1 :Fig. 2 :")
![mean: 0.1, std: 0.5) (std: 0.5, kernel: (5,15)) Contrast; {c,b} = {2, 4} Zoomed image (param: 5) Vertical flipped image Horizontal flipped image Affine Transformation In; Out = [50,70; 230,50; 50,220] ; [50,70; 230,50; 50,220]](image-4.png "(")
34![Fig. 3: An example of generated images](image-5.png "Fig. 3 :Fig. 4 :")
5![Fig.5: The architecture of ResNet50[21] ](image-6.png "Fig. 5 :")
![. The model is not overfitted on the 'Contrast & Flip' (training: 95.29%; testing: 91.39%) and 'Flip_Zoom' (training: 93.15%; testing: 90.59) datasets. These two datasets are the best ones in the second step since they give best results to the DL model.](image-7.png "")
6![Fig. 6: Training result of the original dataset](image-8.png "Fig. 6 :")
7![Fig. 7: Training result of the dataset 'Original & Contrast' Following the results presented previously, in the first step, the model overfitted in the generated datasets, except'Original and Contrast' dataset which resulted in an accuracy of 86.23%. Concerning data augmentation strategies namely blur, contrast, noise and zoom, the best cominations for classifying mango leaf diseases are 'Contrast & Flip' and 'Flip & Zoom', according to the results in the second step. These two strategies yielded accuracies of 91.39% and 90.59% respectively. In the final step, applying the 'Affine Transformation' strategy to the datasets generated by these two strategies revealed that the best combination for mango leaf diseases classification is 'Contrast & Flip & Affine Transformation' since it yielded an accuracy of 97.80%.](image-9.png "Fig. 7 :")
8![Fig. 8: Training result of the 'Contrast & Flip & Affine Transformation' dataset](image-10.png "Fig. 8 :")
9![Fig. 9: Training result of the 'Flip & Zoom & Affine Transformation' dataset V. Conclusion and Future Works This paper presented three contributions. The first allowed us to know the impact of data augmentation techniques namely blur, contrast, flip, noise and zoom in mango leaf diseases classification. The second is to know the best combinations between these techniques which give the best performance to the deep learning model. The last one reveals that applaying 'affine transformation' technique to the combination 'Contrast & Flip' gives the best performance to the Resnet50 CNN with an accuracy of 97.80%.This solution can be used to improve the performance of DL models for image classification with small datasets.Our future work, is to propose a dataset of mango leaf diseases with images captured in mango orchards of a sahelian country like Senegal. Applying](image-11.png "Fig. 9 :")
1OriginalOriginal & BlurOriginal & ContrastOriginal & FlipOriginal & NoiseOriginal & ZoomTrain160032006400480032006400Validation400800160012008001600Test500500500500500500Total250045008500650045008500
2Blur &ContrastBlur & FlipBlur &NoiseBlur &ZoomContrast &FlipContrast &NoiseContrast &ZoomFlip &NoiseFlip &ZoomNoise &ZoomTrain80006400480080009600800011200640096008000Validatio n2000160012002 000240020002800160024002000Test500500500500500500500500500500Total10500850065001050012500105001450085001250010500Original imageNoised imageBlurred image
3Original &Original &Original &Original &Original &BlurContrastFlipNoiseZoomTraining Accuracy (%)98.2590.5695.3576.3692.76Testing Accuracy (%)84.2186.2380.6034.8484.80Resultoverfittingokoverfitting overfitting overfitting
4Blur &ContrastBlur & FlipBlur &NoiseBlur &ZoomContrast &FlipContrast &NoiseContrast &ZoomFlip &NoiseFlip &ZoomNoise &ZoomTrainingAccuracy(%)87.2894.1478.2992.0895.2984.2588.7178.8093.1582.48TestingAccuracy(%)65.4585.3063.3265.8291.3931.7458.23645.8590.5978.47Resultoverfittingoverfittingoverfittingoverfittingokoverfittingoverfittingoverfittingokoverfitting
5Original & BlurOriginal & ContrastTraining dataset50 05651 315Validation dataset12 51412 828Test dataset500500Total63 07064 643Training Accuracy (%)98.5497.44Testing Accuracy (%)97.8093.98
			© 2023 Global Journals
			( )Year 2023
			( )Year 2023
		
		
## Acknowledgements

The authors would like to thank IRD (Institut de Recherche pour le Développement) SENEGAL for access to their server which was used in this study.

			
* 
	
		Data augmentation for automated pest classification in Mango farms
		
			KKusrini
		
		
			SSuputa
		
		
			ASetyanto
		
		
			IM AAgastya
		
		
			HPriantoro
		
		
			KChandramouli
		
		
			EIzquierdo
		
		doi:10.10 16/j.compag.2020.105842
	
	
		Computers and Electronics in Agriculture
		
			179
			105842
			2020
		
	
* 
	
		Review: Some common disease in mango
		
			NFRosman
		
		
			NAAsli
		
		
			SAbdullah
		
		
			MRusop
		
	
		AIP Conference Proceedings
		
			2151
			20019
			2019
		
	
* 
	
		
		10.1063/1.5124649
		
		
* 
	
		Deep Learning for Automatic Quality Grading of Mangoes: Methods and Insights. Computer Vision and Pattern Recognition
		
			Shih-Lun
		
		
			Hsiao-YenWu
		
		
			Yu-LunTung
		
		
			Hsu
		
		
			2020
		
	
* 
	
		A Robust Deep-Learning-Based Detector for Real-Time Tomato Plant Diseases and Pests Recognition
		
			AFuentes
		
		
			SYoon
		
		
			SKim
		
		
			DPark
		
	
		Sensors
		
			17
			9
			2022
			2017
		
	
* 
	
		Rethinking the in ception architecture for computer vision
		
			CSzegedy
		
		
			VVanhoucke
		
		
			SIoffe
		
		
			JShlens
		
		
			ZWojna
		
	
		Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
				the IEEE Conference on Computer Vision and Pattern RecognitionLas Vegas
		
			2016
			
		
* 
	
		Mango Leaf Ailment Detection using Neural Network Ensemble and Support Vector Machine
		
			NabodipSutrodhor
		
		
			MdFirozMolla Rashied Hussein
		
		
			ProkashMridha
		
		
			TasrifKarmokar
		
		
			Nur
		
		10.5120/ijca2018917746
	
	
		International Journal of Computer Applications
		
			181
			13
			
			August 2018
		
	
* 
	
		Mango Leaf Diseases Identification Using Convolutional Neural Network
		
			SArivazhagan
		
		
			SVineth Ligi
		
	
		International Journal of Pure and Applied Mathematics
		
			120
			6
			
			2018. 2018
		
	
* 
	
		Mango Leaf Disease Recognition and Classification Using Novel Segmentation and Vein Pattern Technique
		
			RSaleem
		
		
			JHShah
		
		
			MSharif
		
		
			MYasmin
		
		
			H.-SYong
		
		
			JCha
		
		10.3390/app112411901
		
	
		Appl. Sci
		
			2021
			11901
		
	
* 
	
		Mango leaf disease recognition using neural network and support vector machine
		
			MRMia
		
		
			SRoy
		
		
			SKDas
		
		
			MARahman
		
		doi: 10.1007/s 42044-020-00057-z
	
	
		Iran Journal of Computer Science
		
			3
			3
			
			2020
		
	
* 
	
		A8_ Large-Scale Image Recognition. Very Deep Convolutional Networks for Large-Scale Image Recognition
		
			KSimonyan
		
		
			AZisserman
		
		10.48550/arXiv.1409.1556
		
		
			2014
		
	
* 
	
		A survey on Image Data Augmentation for Deep Learning
		
			CShorten
		
		
			TMKhoshgoftaar
		
		10.1186/s40537-019-0197-0
		
	
		J Big Data
		
			6
			60
			2019
		
	
* 
	
		CNN and Data Augmentation Based Fruit Classification Model
		
			RDandavate
		
		
			VPatodkar
		
		10.1109/I-SMAC49090.2020.9243440
	
	
		2020 Fourth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC)
				
			2020
			
		
* 
	
		Classification of Indonesian Batik Using Deep Learning Techniques and Data Augmentation
		
			MAAgastya
		
		
			ASetyanto
		
		10.1109/ICITISEE.2018.8720990
	
	
		3rd International Conference on Information Technology, Information System and Electrical Engineering (ICITISEE)
				
			2018. 2018
			
		
* 
	
		Deep fruit detection in orchards
		
			SBargoti
		
		
			JUnderwood
		
		10.1109/ICRA.2017.7989417
	
	
		2017 IEEE International Conference on Robotics and Automation (ICRA
				
			2017
			
		
* 
	
		Deep Learning for Automatic Quality Grading of Mangoes: Methods and Insights
		
			S. -LWu
		
		
			H. -YTung
		
		
			Y. -LHsu
		
		10.1109/ICMLA51294.2020.00076
	
	
		2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA)
				
			2020
			
		
* 
	
		Image based fruit category classification by 13-layer deep convolutional neural network and data augmentation
		
			YDZhang
		
		
			ZDong
		
		
			XChen
		
		10.1007/s11042-017-5243-3
		
	
		Multimed Tools Appl
		
			78
			
			2019
		
	
* 
	
		Multi-Parameter Based Mango Grading Using Image Processing and Machine Learning Techniques
		
			ADSupekar
		
		
			MWakode
		
		
		INFOCOMP Journal of Computer Science
		
			19
			2
			
			2020
		
	
* 
	
		Major Tropical Fruits: Preliminary results 2021
		FAO. 2022
		
			Rome
		
	
* 
	
		2022) Mango Diseases Classification Solutions Using Machine Learning or Deep Learning: A Review
		
			DFaye
		
		
			IDiop
		
		
			DDione
		
		10.4236/jcc.2022.1012002
	
	
		Journal of Computer and Communications
		
			10
			
		
* 
	
		Deep Residual Learning for Image Recognition
		
			KaimingHe
		
	
		IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
				
			2016. 2015
			
		
* 
	
		Performance Evaluation of Deep CNN-Based Crack Detection and Localization Techniques for Concrete Structures
		
			LAli
		
		
			FAlnajjar
		
		
			HAJassmi
		
		
			MGocho
		
		
			WKhan
		
		
			MASerhani
		
		10.3390/s21051688
		
	
		Sensors
		
			21
			5
			1688
			2021
		
	
* 
	
		Recent Advancements in Fruit Detection and Classification Using Deep Learning Techniques
		
			CChiagoziem
		
		
			QinUkwuoma
		
		
			MdZhiguang
		
		
			Belal Bin
		
		
			LiaqatHeyat
		
		
			ZahraAli
		
		
			HappyNAlmaspoor
		
		
			Monday
		
		10.1155/2022/9210947
		ID 9210947
		
		
			2022
			29
		
	
	Mathematical Problems in Engineering