Abstract-Mango is one of the most traded fruits in the world. Therefore, mango production suffers from several pests and diseases which reduce the production and quality of mangoes and their price in the local and international markets. Several solutions for automatic diagnosis of these pests and diseases have been proposed by researchers in the last decade. These solutions are based on Machine Learning (ML) and Deep Learning (DL) algorithms. In recent years, Convolutional Neural Networks (CNNs) have achieved impressive results in image classification and are considered as the leading methods for image classification. However, one of the most significant issues facing mango pests and diseases classification solutions is the lack of availability of large and labeled datasets. Data augmentation is one of solutions that has been successfully reported in the literature. This paper deals with data augmentation techniques namely blur, contrast, flip, noise, zoom and affine transformation to know, on the one hand, the impact of each technique on the performance of a ResNet50 CNN using an initial small dataset, on the other hand, the combination between them which gives the best performance to the DL network. Results show that the best combination classifying mango leaf diseases is 'Contrast & Flip & Affine transformation' which gives to the model a training accuracy of 98.54% and testing accuracy of 97.80% with an f1_score > 0.9. Keywords: data augmentation, mango, disease, classification, deep learning, resnet50. # I. introduction ango or Magnifera Indica L. (scientific name) is a lucrative fruit widely cultivated in tropical countries. It belongs to the family anacardiaceous. Its overall consumption in 2017 was estimated at 50.65 million metric tons [1]. This fruit was in 2021, in terms of quantities exported, the third most traded tropical fruit after pineapple and avocado [2]. Mango fruit is very appreciated because of its richness in nutrients (vitamins A, B, C, K, ...), flavorful pulp and alluring aroma [3,4]. This fruit contributes enormous economic benefits to exporting countries and mango growers. However, mango production suffers severely from pests and diseases witch lead to a reduction of both quality and quantity. This influence mango price in the international market. In the last decade, several solutions for automatic diagnosis of these pests and diseases have been proposed by researchers. These solutions are first based on image processing (IP) and machine learning (ML) techniques and finally, in the last five years, on deep learning (DL) algorithms DL based solutions have achieved state-of-the-art performance on Image Net and other benchmark datasets [5]. In recent years, Convolutional Neural Networks (CNNs) have achieved impressive results in image classification and are considered as the leading methods for object detection in computer vision [5,6]. However, one of the biggest issues facing mango pests and diseases identification solutions is the lack of availability of large and labeled datasets [7,8,9,10]. The limited training data inhibits performance of DL based models which need big data on which to train well to avoid overfitting and improves the model's generalization ability. Overfitting happens when the training accuracy is higher than the accuracy on the validation/test set. The generalizability of a model is the difference in performance it exhibits when evaluated on training data (known data) versus test data (unknown data). The use of data augmentation process is one of solutions that has been successfully reported in the literature [1]. This overfitting solution generates a more comprehensive set that minimizes the distance between training and validation sets. A data augmentation process based on image manipulation is presented in this paper for improving the quality of a small dataset of mango leaves presented in [1]. The specific contributions of the paper include: ? Generate a dataset for every data augmentation strategy except affine transformation. The DL model is trained in each generated dataset to know the impact of each data augmentation technique in the performance of the model. The rest of the paper is organized as follows: Section 2 is an overview of the literature review, Section 3 deals with the data acquisition and data augmentation techniques and the CNN model used, Section 4 presents and discusses the results of the data augmentation techniques. The last section concludes the paper and announces the futures works of the authors. II. # Ralated Works The literature review presented in this paper concerns only data augmentation strategies used for ango pest or diseases classification and mango or other fruits quality grading. Shorten et al. [11] presented a survey dealing with image data augmentation algorithms such as color space augmentations, geometric transformations, mixing images, kernel filters, random erasing, adversarial training, feature space augmentation, generative adversarial networks (GAN), meta-learning and neural style transfer. They also discussed the application of augmentation methods based on GANs and others characteristics of data augmentation such as curriculum learning, test-time augmentation, resolution impact, and final dataset size. Dandavate et al. [12] applied data augmentation techniques namely rotation, scaling and image translation to a fruit dataset to avoid overfitting and obtain better performances with their simple CNN model. Agastya et al. [13] used VGG-16 and VGG-19 for an automatic batik classification. Applying random rotation in a certain degree, scaling and shearing, they improve the accuracy of their models up to 10%. Bargoti et al. [14] presented a fruit (mangoes, apples, and almonds) detection system using Faster R-CNN. They used image flipping and scaling to improve the performance of their model with an F1-score of > 0,9 achieved for mangoes and apples. Wu et al. [15] investigated several deep learning-based methods for mango quality grading. VGG-16 is found to be the best model for this task. During the training of their models, authors applied, at each epoch, randomly data augmentation strategies such as horizontal or vertical image flipping, rotation, brightness, contrast and zoom in/out. Zang et al. [16] developed a fruit category identification by using a 13-layer CNN and three data augmentation strategies namely noise injection, image rotation and Gamma correction. The final obtained overall accuracy is 94.94%, at least 5 percentage points higher than state-of-the-art approaches. Supekar et al. [17] performed a mango grading system based on ripeness, size, shape and defects. They used K-means clustering for defect segmentation and Random Forest Classifiers. To avoid overfitting with an initial training dataset of 69 images, authors applied image rotation on angle of 90,180 and 270. The final training dataset obtained consists of 522 images which allows their model to obtain an overall accuracy of 88,88%. # III. # Methodology and Model a) Data aquisition The dataset used in this paper is a part of 'MangoLeafBD' dataset produced by Ahmed et al. [18] and downloadable from 'Mendeley Data'' platform (https://data.mendeley.com/datasets/hxsnvwty3r). MangoLeafBD dataset contains height classes, seven of which correspond to mango leaf diseases and one contains healthy leaves. In this paper, four diseases namely anthracnose, Gall Midge, Powdery Mildew and Sooty Mold are treated as they are among the most mango leaf diseases treated by researchers during the last five years [19] (Fig. 1andFig.2). The dataset used contains four classes corresponding respectively to these diseases and a class of healthy leaves. There are 500 RGB leaf images of 240x320 pixels in each class making a total of 2,500. Images are in JPG format. # b) Data augmentation Data augmentation is a powerful solution against overfitting. It allows a model with a small dataset to become robust and generalizable. There are two categories of data augmentation: the first is based on image manipulations and the second on DL (generative adversarial networks (GANs), feature space augmentations, adversarial training, Neural Style Transfer, Meta Learning Data Augmentation) [11]. This research focuses on the first category because i) the second is generally used to generate synthetics images from quite a large dataset, ii) mango leaf images taken under real-world conditions suffer mainly from the problems of temperature variation, shadowing, overlapping of leaves, and presence of multiple objects. The first category can allow us to generate images in these cases. # This papers deals with following techniques: ? Noise injection Image noise is a random disturbance in the brightness and color of an image. Noise injection is an effective way to avoid overfitting and improves the test ability of a machine learning model [13]. There are several ways to add noise to an image (e.g. Gaussian noise, Salt and Pepper noise, Speckle noise, ?). Gaussian noise is performed fixing mean parameter to 0 and sigma parameter to 0.05. # ? Blur Blurring an image means make it less sharp. Photographic blur occurs with movement in the model or scene relative to the camera, and vice versa. To realize this, Gaussian blur was carried out using a kernel size (5,15). # ? Contrast and Brightness The Contrast and Brightness function improves the appearance of an image. Brightness improves the overall clarity of the image and contrast adjusts the difference between the darkest and lightest colors. Contrast parameters used are {0.5;2;2.5} and brightness parameters are {1;4;5}. For each original image, three new images are generated with respectively the following parameters contracts, brightness {c; b}: {0.5; 1}, {2; 4}, {2.5; 5}. # ? Zoom Zooming an image means enlarging it in a sense that the details in the picture became more visible and clear. Each image is zoomed three times and from the center using zoom parameters {3;5;7}. # ? Image flipping To flip or mirror an image means to turn it horizontally (horizontal flip) or vertically (vertical flip). Flip function generates an image so that the left side becomes the right side or the top becomes the bottom. The images are vertically and horizontally flipped using flip parameter 0 and 1 respectively. # ? Affine transformation An affine transformation is, in general a combination of translations, rotations, shears and dilations [12]. It s used to simulate images captured from different camera projections nd positions. Affine transformation is performed using an input matrix (In) of size 2x3 and an output matrix (Out) of the same size. The input matrix corresponds to three points in the input image and the second matrix is their corresponding locations in the output image. In the training dataset, twenty additional images are randomly generated for each image. But after that, the generated images on which there is no part of mango leaf are removed. Fig. 3 shows an example of a diseased mango leaf (anthracnose) on which all these data augmentation techniques are applied. # Number of times Mango Diseases The data augmentation process (Fig. 4) is carried out as follow: First step: For each of the above mentioned data augmentation strategies (except affine transformation), a new dataset for training and validation is generated (Fig. 3, Table 2). Images of the original dataset are added to the generated one. This is to know the impact of each data augmentation strategy on the overall performance of the model. # Second step: Every strategy (except affine transformation) is combined respectively by the 4 others sequentially to generate new datasets (Table 2). Final step: Affine transformation is applied to the best combination that gives the best performance to the DL model (Table 3). The augmentation techniques are carried out using python Open Source Computer Vision Library (OpenCV). [20]. ResNet won the first place at the ImageNet Large Scale Visual Recognition Challenge (ILSVRC 2015). To preserve knowledge, reduce losses and boost performance during the training phase, ResNet introduced residual connections between layers. A residual connection in a layer means that the output of a layer is convolution of its input plus its input [21]. ReNet50 is used in this research. It consists of 50 layers as it is shown by the Fig. 5. The model is updated by replacing the number (1000) of nodes of the softmax output layer by 5 (corresponding to the number of treated mango leaf diseases). # d) Implementation details The data augmentation process and ResNet50 model are all carried out using respectively, OpenCV and Keras labreries. Model's training parameters used include Adam optimizer with a learning rate of 0.001, binary cross-entropy (loss function) and epochs of 8. The model is trained on a server with an NVIDIA GPU and 32 GB of RAM. IV. # Result and Discussion The initial small dataset is splitted as follow: 64% for training, 16% for validation and 20% for testing. After randomly splitting the dataset, we have 1,600 images for training, 400 images for validation and 500 images for testing. Results sho that the training accuracy (87.18%) is greater than the testing accuracy (39.34%). So the model overfitted as it is shown by the Fig. 6. Since the dataset is not enough to train robustly the DL model, data augmentation process is carried out. This ask concerns only training and validation data [22]. Test data remains equal to 500 images. In the first step, after training phase, results show that the DL model overfits on all datasets except 'Original & Contrast' which gives a training and testing accuracy of 90.56% and 86.23% respectively (Table 3, Fig. 7). In the second step, training the model on the combined datasets yielded the results in Table 4 Finally, affine transformation strategy is applied to 'Contrast & Flip' and 'Flip_Zoom' datasets. Results show that 'Contrast & Flip' gives the best performances with an accuracy of 97.80% and a f1_score> 0.9 (Table 5, Fig. 8, Fig. 9). This solution can be used to improve the performance of DL models for image classification with small datasets. Our future work, is to propose a dataset of mango leaf diseases with images captured in mango orchards of a sahelian country like Senegal. Applying this combination as a data augmentation technique to this dataset will allow us to achieve excellent results in mango leaf disease classification using a deep learning model such as ResNet50. Then, this model will be deployed in mobile and web applications to allow mango growers to diagnose diseases in their crops without expert intervention. ![Combination of Data Augmentation Techniques for Mango Leaf Diseases Classification Demba Faye ? , Idy Diop ? , Nalla Mbaye ? & Doudou Dione ?](image-2.png "A") 12![Fig.1: Ranking of the most common and treated mango diseases[19] ](image-3.png "Fig. 1 :Fig. 2 :") ![mean: 0.1, std: 0.5) (std: 0.5, kernel: (5,15)) Contrast; {c,b} = {2, 4} Zoomed image (param: 5) Vertical flipped image Horizontal flipped image Affine Transformation In; Out = [50,70; 230,50; 50,220] ; [50,70; 230,50; 50,220]](image-4.png "(") 34![Fig. 3: An example of generated images](image-5.png "Fig. 3 :Fig. 4 :") 5![Fig.5: The architecture of ResNet50[21] ](image-6.png "Fig. 5 :") ![. The model is not overfitted on the 'Contrast & Flip' (training: 95.29%; testing: 91.39%) and 'Flip_Zoom' (training: 93.15%; testing: 90.59) datasets. These two datasets are the best ones in the second step since they give best results to the DL model.](image-7.png "") 6![Fig. 6: Training result of the original dataset](image-8.png "Fig. 6 :") 7![Fig. 7: Training result of the dataset 'Original & Contrast' Following the results presented previously, in the first step, the model overfitted in the generated datasets, except'Original and Contrast' dataset which resulted in an accuracy of 86.23%. Concerning data augmentation strategies namely blur, contrast, noise and zoom, the best cominations for classifying mango leaf diseases are 'Contrast & Flip' and 'Flip & Zoom', according to the results in the second step. These two strategies yielded accuracies of 91.39% and 90.59% respectively. In the final step, applying the 'Affine Transformation' strategy to the datasets generated by these two strategies revealed that the best combination for mango leaf diseases classification is 'Contrast & Flip & Affine Transformation' since it yielded an accuracy of 97.80%.](image-9.png "Fig. 7 :") 8![Fig. 8: Training result of the 'Contrast & Flip & Affine Transformation' dataset](image-10.png "Fig. 8 :") 9![Fig. 9: Training result of the 'Flip & Zoom & Affine Transformation' dataset V. Conclusion and Future Works This paper presented three contributions. The first allowed us to know the impact of data augmentation techniques namely blur, contrast, flip, noise and zoom in mango leaf diseases classification. The second is to know the best combinations between these techniques which give the best performance to the deep learning model. The last one reveals that applaying 'affine transformation' technique to the combination 'Contrast & Flip' gives the best performance to the Resnet50 CNN with an accuracy of 97.80%.This solution can be used to improve the performance of DL models for image classification with small datasets.Our future work, is to propose a dataset of mango leaf diseases with images captured in mango orchards of a sahelian country like Senegal. Applying](image-11.png "Fig. 9 :") 1OriginalOriginal & BlurOriginal & ContrastOriginal & FlipOriginal & NoiseOriginal & ZoomTrain160032006400480032006400Validation400800160012008001600Test500500500500500500Total250045008500650045008500 2Blur &ContrastBlur & FlipBlur &NoiseBlur &ZoomContrast &FlipContrast &NoiseContrast &ZoomFlip &NoiseFlip &ZoomNoise &ZoomTrain80006400480080009600800011200640096008000Validatio n2000160012002 000240020002800160024002000Test500500500500500500500500500500Total10500850065001050012500105001450085001250010500Original imageNoised imageBlurred image 3Original &Original &Original &Original &Original &BlurContrastFlipNoiseZoomTraining Accuracy (%)98.2590.5695.3576.3692.76Testing Accuracy (%)84.2186.2380.6034.8484.80Resultoverfittingokoverfitting overfitting overfitting 4Blur &ContrastBlur & FlipBlur &NoiseBlur &ZoomContrast &FlipContrast &NoiseContrast &ZoomFlip &NoiseFlip &ZoomNoise &ZoomTrainingAccuracy(%)87.2894.1478.2992.0895.2984.2588.7178.8093.1582.48TestingAccuracy(%)65.4585.3063.3265.8291.3931.7458.23645.8590.5978.47Resultoverfittingoverfittingoverfittingoverfittingokoverfittingoverfittingoverfittingokoverfitting 5Original & BlurOriginal & ContrastTraining dataset50 05651 315Validation dataset12 51412 828Test dataset500500Total63 07064 643Training Accuracy (%)98.5497.44Testing Accuracy (%)97.8093.98 © 2023 Global Journals ( )Year 2023 ( )Year 2023 ## Acknowledgements The authors would like to thank IRD (Institut de Recherche pour le Développement) SENEGAL for access to their server which was used in this study. * Data augmentation for automated pest classification in Mango farms KKusrini SSuputa ASetyanto IM AAgastya HPriantoro KChandramouli EIzquierdo doi:10.10 16/j.compag.2020.105842 Computers and Electronics in Agriculture 179 105842 2020 * Review: Some common disease in mango NFRosman NAAsli SAbdullah MRusop AIP Conference Proceedings 2151 20019 2019 * 10.1063/1.5124649 * Deep Learning for Automatic Quality Grading of Mangoes: Methods and Insights. Computer Vision and Pattern Recognition Shih-Lun Hsiao-YenWu Yu-LunTung Hsu 2020 * A Robust Deep-Learning-Based Detector for Real-Time Tomato Plant Diseases and Pests Recognition AFuentes SYoon SKim DPark Sensors 17 9 2022 2017 * Rethinking the in ception architecture for computer vision CSzegedy VVanhoucke SIoffe JShlens ZWojna Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition the IEEE Conference on Computer Vision and Pattern RecognitionLas Vegas 2016 * Mango Leaf Ailment Detection using Neural Network Ensemble and Support Vector Machine NabodipSutrodhor MdFirozMolla Rashied Hussein ProkashMridha TasrifKarmokar Nur 10.5120/ijca2018917746 International Journal of Computer Applications 181 13 August 2018 * Mango Leaf Diseases Identification Using Convolutional Neural Network SArivazhagan SVineth Ligi International Journal of Pure and Applied Mathematics 120 6 2018. 2018 * Mango Leaf Disease Recognition and Classification Using Novel Segmentation and Vein Pattern Technique RSaleem JHShah MSharif MYasmin H.-SYong JCha 10.3390/app112411901 Appl. Sci 2021 11901 * Mango leaf disease recognition using neural network and support vector machine MRMia SRoy SKDas MARahman doi: 10.1007/s 42044-020-00057-z Iran Journal of Computer Science 3 3 2020 * A8_ Large-Scale Image Recognition. Very Deep Convolutional Networks for Large-Scale Image Recognition KSimonyan AZisserman 10.48550/arXiv.1409.1556 2014 * A survey on Image Data Augmentation for Deep Learning CShorten TMKhoshgoftaar 10.1186/s40537-019-0197-0 J Big Data 6 60 2019 * CNN and Data Augmentation Based Fruit Classification Model RDandavate VPatodkar 10.1109/I-SMAC49090.2020.9243440 2020 Fourth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC) 2020 * Classification of Indonesian Batik Using Deep Learning Techniques and Data Augmentation MAAgastya ASetyanto 10.1109/ICITISEE.2018.8720990 3rd International Conference on Information Technology, Information System and Electrical Engineering (ICITISEE) 2018. 2018 * Deep fruit detection in orchards SBargoti JUnderwood 10.1109/ICRA.2017.7989417 2017 IEEE International Conference on Robotics and Automation (ICRA 2017 * Deep Learning for Automatic Quality Grading of Mangoes: Methods and Insights S. -LWu H. -YTung Y. -LHsu 10.1109/ICMLA51294.2020.00076 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA) 2020 * Image based fruit category classification by 13-layer deep convolutional neural network and data augmentation YDZhang ZDong XChen 10.1007/s11042-017-5243-3 Multimed Tools Appl 78 2019 * Multi-Parameter Based Mango Grading Using Image Processing and Machine Learning Techniques ADSupekar MWakode INFOCOMP Journal of Computer Science 19 2 2020 * Major Tropical Fruits: Preliminary results 2021 FAO. 2022 Rome * 2022) Mango Diseases Classification Solutions Using Machine Learning or Deep Learning: A Review DFaye IDiop DDione 10.4236/jcc.2022.1012002 Journal of Computer and Communications 10 * Deep Residual Learning for Image Recognition KaimingHe IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2016. 2015 * Performance Evaluation of Deep CNN-Based Crack Detection and Localization Techniques for Concrete Structures LAli FAlnajjar HAJassmi MGocho WKhan MASerhani 10.3390/s21051688 Sensors 21 5 1688 2021 * Recent Advancements in Fruit Detection and Classification Using Deep Learning Techniques CChiagoziem QinUkwuoma MdZhiguang Belal Bin LiaqatHeyat ZahraAli HappyNAlmaspoor Monday 10.1155/2022/9210947 ID 9210947 2022 29 Mathematical Problems in Engineering