# I. Introduction engue fever (DF) is an arthropod-borne viral disease common past three decades. According to WHO, 51-101 million new infections with dengue occur every year in more than a hundred endemic countries [1]. Dengue fever is a severe viral infection with potentially fatal consequences. Dengue fever was originally known as "water poison." The dengue caused by the female Aedes aegypti mosquito is shown in Fig. 1 A Female Aedes Aegypti Mosquito In the 1780s, the first clinically recognized epidemics of dengue occurred at the same time in Africa, Asia, and North America. Benjamin Rush was named "break-bone fever" based on the features of arthralgia and myalgia. The dengue epidemic was first reported in Chennai in 1780, the first virologically proven outbreak of dengue fever in India appeared at Calcutta and the East Coast of India in 1963-64. In the 1970s and 1980s, epidemic activity accelerated dramatically, resulting in the widespread of viruses and mosquito vectors and the consequent DENV transmission across the world [2]. The first major DHF epidemic occurred in the Philippines during 1953-1954, continued by a rapid global spread of DF/DHF epidemics. The first major DHF/DSS epidemics in India occurred in 1996, at Delhi and Lucknow, and later extended throughout the country. In India outbreaks of dengue have become more common in many parts. Between 2010 to 2014 incidence of reported cases of dengue was 34.81 per million population. Dengue fever became endemic in Orissa, Uttarakhand, Bihar, Assam, and Jharkhand, in 2010 [3]. # II. Background Study Kassaye Yitbarek Yigzaw et al [2] presented a benchmarking platform for the prediction of communicable diseases. Rathi et al [4] studied dengue infection in Rajasthan. The study was based on 100 admitted children and he classified the patients based on their symptoms. Kalayanarooj S [3] demonstrates the clinical appearances of dengue and DHF. Aldallal, A.S [5] explained that data mining techniques are used for the prediction of non-communicable diseases like heart and diabetes. Agrawal et al [7] demonstrated the ensemble approach by using multiple classifiers Ada boost, and a decision tree for the prediction of diabetes. Ghosh et al [10] used multiple classifiers for the sentiment analysis performance assessment. Gupta et al [12] compared different ML approaches for heart disease prediction. Mesafint et al [14] explained ML algorithms for the prediction of HIV/AIDS tests. # III. Proposed Methodology The ensemble models are Extreme Gradient Boost (XGB), Random Forest (RF) by majority voting, and Stacking, which is based on a combination of heterogeneous classifiers like NB, KNN, and SVM. It is very helpful to consider ensemble techniques [6], for dengue fever diagnosis and prediction. The proposed framework is shown in Fig 3. The main aim of data acquisition and the data pre-processing module is to get the Dengue fever dataset and process them into a suitable form for further analysis. Datasets have features/attributes which will finally distinguish the data into patient sick and healthy. The dataset has thirty-eight features and different data types. The dataset is spitted into an 80% training set and a 20% testing dataset. The pre-processing includes feature selection and missing value imputation [8]. The proposed model combines different classifiers such as Naïve Bayes, K -Nearest Neighbor, and Support vector machine. For each classifier, the output is predicted. Each base classifier is used in the ensemble framework by training data to make it useful for the prediction of dengue. Dataset features and target values are known to each classifier, which in turn can predict whether the disease is present or not. # i. Description of the Dengue Dataset The patient data is collected from the Department of General Medicine, PESIMSR, Kuppam, Andrapradesh. The patient is diagnosed in the laboratory using the dengue duo card test shown in fig 4. Dataset consists of 18 attributes and one target value. The number of patients having each symptom is listed in Table I and corresponding bar charts explain the importance of each feature [9] are shown in fig. 7. Among 140 dengue-infected cases all the patients are suffering from fever,106 headache, 97 and 94 myalgia and arthralgia and 83 low back pain and others. # ii. XGBoost Boosting is a broadly used and highly effective machine learning algorithm. An end-to-end tree boosting system called XGBoost is widely used by data experts. The important factor is its scalability for better accuracy. The system is ten times faster than existing conventional methods. The scalability of XGBoost is due to several algorithm optimizations. Parallel and distributed computing will make learning faster [15]. In the stacking algorithm, the base (first-level) classifiers are trained by the same set of the training sample, which is used to prepare the inputs for the meta (second-level) classifier, which may cause overfitting. The stackingCVclassifier uses the cross-validation method. The dataset is split into k folds, and k-1 folds are used to fit the level-1 classifier in k successive rounds. In every iteration, the level-1 classifiers are then applied to the remaining subset. The predictions of the base classifiers are then stacked and which is an input to the level-2 classifier. # NO. OF PATIENTS # IV. Performance Evaluation The clinical dengue fever data set was used to analyse the performance of the ensemble model and to compare it with the other models. The class labels dengue infected (DF) with the dengue not infected (NDF) is replaced with class 1 and class 0 to maintain uniformity [16]. Each dataset is split into training and testing sets. Cross validations of 10-fold are applied. performance measure of each base classifier, as well as the ensemble model, is calculated using a confusion matrix. The base classifiers NB, SVM & KNN are trained first and then they are tested. The proposed research work analysed the performance of the ensemble methods XGB, RF, and Stacking. The metrics are accuracy, recall, precision, and f1-score. The confusion matrix illustrates the actual and predicted classification [15,17]. The equations ( 1), ( 2), (3), and ( 4), are used to calculate the metrics [17]. III and Fig. 11. The ensemble methods XGB, RF, and Stacking give 98.57%, 99.12%, and 99.56% for the training dataset, whereas 97.80%, 94.82% and 98.27% for the testing dataset. We observed better accuracy for ensemble methods. IV. The AUC for the proposed ensemble XGB is 97.14% and 97.81% for random forest 98.14% and 99.14%, for stacking 98.14% and 98.68% for testing and Training datasets respectively. As shown in Table III, the AUC values for the datasets lie between 0.97 to 0.99, indicating that the positive class values are correctly distinguished from the negative class values. # Table II: Confusion Matrix # Actual # Table V: Auc Comparision # V. Conclusion The main objective of this research work is to the prediction of dengue fever using ensemble techniques. We used bagging, boosting, and stacking methods for prediction and the end results are compared with the NB, KNN, and SVM models. The experimental results prove that Ensemble techniques are the best models for the prediction of dengue fever. The techniques were analysed using performance metrics. The accuracy for the extended boost, random forest with majority voting, and stacking using metaclassifiers gives better accuracy for both the training and testing datasets compared to other models. The extended analysis was done by using the roc curve and precision-recall curve, which explains the performance of the models. The Area under the curve lies between 0.97 to 0.99. The ensemble models are the better models for the prediction of dengue-infected patients. 2![Fig. 2: Pictorial Representation of Dengue Fever Symptoms According to the World Health Organization, Dengue fever is classified into four types: DENV1, DENV2, DENV3, and DENV4. The incubation period is 2 to 7 days [4]. The Dengue symptoms are high fever, joint and muscle pain, headache, vomiting, rashes, pain behind the eyes, diarrhea, etc. The dengue fever symptoms are shown in Fig.2. Different ML algorithms are used for dengue fever classification such as NB classifier, K Nearest Neighbour, Decision Tree, Support Vector Machine, and Neural Networks. The proposed model demonstrates ensemble techniques called bagging, boosting, and stacking. The dengue binary classification is based on Extreme Gradient Boost (XGB), Random Forest by](image-2.png "Fig. 2 :") 3![Fig. 3: An Ensemble Frame Work for the Prediction and Evaluation of Dengue Dataset](image-3.png "Fig. 3 :") 4![Fig. 4: Diagnosis-Dengue Duo Card Test It consists of 286 instances with 18 attributes and one target. The target consists of dengue patients and Non dengue patients. levels. The numerical value is assigned for each level like 0 for non-dengue patients (NDF), and 1 for Dengue patients (DF). The screenshot of the dataset is shown in Fig.5.](image-4.png "Fig. 4 :") 5![Fig. 5: The screenshot of the dataset The target value consists of 140 cases of dengue infected and 146 non-dengue cases among 286 cases. The distribution is shown in Fig.6](image-5.png "Fig. 5 :") 6![Fig. 6: Distribution of a Target Value](image-6.png "Fig. 6 :") 89![Fig. 8: Random Forest Algorithm Procedure iii. StackingStacking is an ensemble technique, which uses meta-classifiers to learn, the possible way to combine two or more base ML algorithms predictions. The base or level 0 classifiers consists of different ML algorithms and therefore stacking ensembles are generally heterogeneous classifiers. Level 1 classifiers are used as new features to train a meta classifier. An ensemble stacking procedure is illustrated in fig 9.The meta classifier can be any classifier[13] ](image-7.png "Fig. 8 :Fig. 9 :") ![and experimental score of the NB, SVM, KNN, XGB, RF, and Stacking models training dataset and testing dataset are shown in Fig.10.](image-8.png "") 1011![Fig. 10:](image-9.png "Fig. 10 :Fig. 11 :") 13![Fig. 13: Testing Dataset Precision, Recall and F1 Score Comparison of ML Models The precision, recall, and f1 score for training and testing datasets are listed in Table IV and a comparison of an ensemble with other methods is shown in fig 12 and 13, which explains the ensemble methods give better performance for unseen data. The Receiver Operating Characteristic curve and the Precision-Recall curve is a graphical representation of a, by calculating and plotting the false positive rate (FPR) Vs the true positive rate (TPR) and precision Vs recall for each classifier at various threshold values. The precision and recall curve for both training and testing datasets is shown in fig .14 and fig.15 correspondingly the ROC curve is shown in Fig 16 and Fig 17.](image-10.png "Fig. 13 :") 14![Fig. 14: The Performance Comparison of the Training Dataset by Precision Recall Curve](image-11.png "Fig. 14 :") 1617![Fig. 16: The Performance Comparison of the Training Dataset by ROC Curve](image-12.png "Fig. 16 :Fig. 17 :") ![](image-13.png "") ![](image-14.png "") ![](image-15.png "") ![](image-16.png "") ITarget200150100 50Year 2022Non DengueFig. 7: Bar Chart Representationb) Ensemble MethodsClinical Feature Ensemble means combining multiple models. This approach gives better performance compared to aNo. of PatientsFever single model. Thus, a set of models is used for Headache predictions than a single model [7]. The main challenge is to obtain a base model which gives different kinds of140 106( ) DMyalgia errors. If the ensemble technique of bagging, boosting,97Arthralgia and stacking are used for classification, high accuracies94Low Backache can be obtained. Bagging creates a different subset of83Retro Orb Pain training data from the sample training dataset & the final Rashes output depends on majority voting. e.g., Random Vomiting Forest. Boosting the creation of sequential models by Pain Abdomen combining weak learners with strong learners and the finally constructed model has the highest accuracy e.g.,71 65 57 41XGBOOST and ADA BOOSTBleeding39i.Cough30Diarrhea25Sore Throat16Breathlessnes6Seizures5© 2022 Global Journals SVM RFTraining DatasetSVM RFTesting datasetNBNBAccuracy of Random Forest: 99.12Accuracy of Random Forest :Accuracy of Naive Bayes model: 95.40 precision recall f1-score 0 0.93 0.98 0.9 1 0.98 0.92 0.95 Accuracy of Support Vector Classifier: 97.5 precision precision recall f1-score recall f1-score 0 0.96 0.98 0.97 1 0.99 1.00 1.00 0 0.98 1.00 0.99 1 1.00 0.98 0.99Accuracy of Naivey bayes : 93.17 precision recall f1-score 0 0.94 0.98 0.96 1 0.97 0.93 0.95 94.82 Accuracy of Support Vector machine: precision recall f1-score 89.65 precision recall f1-score 0 0.91 0.98 0 0.97 0.98 0.97 0.94 1 0.98 0.89 0.93 1 0.98 0.96 0.XGBXGBKNNKNNAccuracy of Extreme Gradient BoostAccuracy of Extreme gradient Boost:98.57 precisionrecall f1-score:97.80 precisionrecall f1-score00.990.970.9800.970.980.9710.970.990.9810.980.960.97 Matrix and Experimental Results of Training and Testing Dataset of the Ensemble and Other M Models IIIYear 202247( ) DClassifiersTraining DatasetTesting DatasetNB95.4093.17KNN96.4985.66SVM97.5189.65XGB98.5797.80RF99.1294.82Stacking99.5698.27© 2022 Global JournalsGlobal Journal of Computer Science and TechnologyVolume XXII Issue II Version I IV1109097.580NBKNNSVMXGBRFYear 202248Volume XXII Issue II Version I ( ) D Global Journal of Computer Science and TechnologyClassifiers NB KNN SVM RF XGB Ensemble StackingTraining dataset Precision Recall (%) (%) NDF 93 98 DF 98 92 NDF 93 100 DF 100 93 NDF 96 98 DF 99 100 NDF 98 100 DF 100 98 NDF 99 97 DF 97 99 NDF 100 99 DF 99 100f1-score (%) 95 95 97 96 97 100 99 99 98 98 100 100Classifiers NB KNN SVM RF XGB EnsembleTesting Dataset Precision (%) NDF 94 DF 97 NDF 86 DF 96 NDF 91 DF 98 NDF 97 DF 98 NDF 97 DF 98 NDF 97Recall (%) 98 93 97 81 98 89 98 96 98 96 99f1-score (%) 96 95 91 88 94 93 97 97 97 97 98© 2022 Global Journals Year 202250( ) DClassifierTesting DatasetTraining DatasetAuc_Nb0.96290.9514Auc_Knn0.83330.9342Auc_Svc0.94440.9956Auc_Xgb0.9781Auc_Rf0.98140.9914Auc_Scv0.98140.9868 ## Acknowledgment Our sincere thanks to Dr. Veerapuram Manoj Reddy, Department of General Medicine, PES Medical sciences and Research, Kuppam, Andrapradesh for his support for the collection of dengue data. * World Health Organization March 2014. 2014. 16 Oct 2019 Geneva Fact sheet no * A communicable disease prediction benchmarking platform KassayeYigzaw JohanBellika BHI2014.564-568.10.1109/BHI.2014.6864427 IEEE-EMBS International Conference on Biomedical and Health Informatics 2014. 2014 * Clinical Manifestations and Management of Dengue/DHF/DSS SKalayanarooj 10.2149/tmh.2011-S10.Epub 22500140 PMC3317599 2011 Dec. 2011 Dec 22 39 Trop Med Health. Suppl * STUDY OF DENGUE INFECTION IN RURAL RAJASTHAN Manisha&Rathi Masand AlokPurohit 10.14260/jemds/2015/993 Journal of Evolution of Medical and Dental Sciences 2015 * Using Data Mining Techniques to Predict Diabetes and Heart Diseases ASAldallal AAAl-Moosa 4th International Conference on Frontiers of Signal Processing ICFSP 2018. 2018 * IntelliHealth: A medical decision support application using a novel weighted multi-layer classifier ensemble framework SabaBashir UsmanQamar Farhan Hassan Khan 10.1016/j.jbi.2015.12.001 Journal of Biomedical Informatics 1532-0464 59 2016. Pages 185-200 * Diabetes Diagnosis Prediction Using Ensemble Approach Agrawal GBhargav ESpandana 10.1007/978-981-15-5546-6_66 2021 * Julia&Miao KathleenMiao 2018 * Cardiotocographic Diagnosis of Fetal Health based on Multiclass Morphologic Pattern Predictions using Deep Learning Classification 11.10.14569/IJACSA.2018.090501 International Journal of Advanced Computer Science and Applications 9 * Classification and Feature Selection Approaches by Machine Learning Techniques: Heart Disease Prediction ChandraReddy NSShue Nee SZhi Min LYing C 10.11113/ijic.v9n1.210 International Journal of Innovative Computing 9 1 2019 * Performance Assessment of Multiple Classifiers Based on Ensemble Feature Selection Scheme for Sentiment Analysis. Applied Computational Intelligence and Soft Computing 12. 10.155/2018/8909357 10. Ghosh, Monalisa & Sanyal, Prof(Dr.) Goutam. 2018. 2018 * Early heart disease prediction using hybrid quantum classification Heidari GerhardHanif & Hellstern 10.48550/arXiv.2208.08882 2022 * Comparison of various machine learning approaches uses in heart ailments prediction Gunjan&Gupta U &Adarsh NReddy BSubba & Rao Ashwath 2161.012010.10.1088/1742-6596/2161/1/012010 Journal of Physics: Conference Series 2022 * Dengue Fever Prediction: A Data Mining Problem KShaukat NMasood SMehreen UAzmeen 10.4172/2153-0602.1000181 J Data Mining Genomics Proteomics 6 181 2015 * Grid search in hyperparameter optimization of machine learning models for prediction of HIV/AIDS test results Daniel & D HMesafint Manjaiah 12.10.1080/1206212X.2021 International Journal of Computers and Applications. 1 63 2021. 19746 * Prediction of Dengue, Diabetes and Swine Flu Using Random Forest Classification Algorithm ATate UGavhane JPawar BRajpurohit GBDeshmukh 2017 * A hybrid Algorithm for Dengue Disease Prediction with Multi Dimensional Data Konadala Kameswara Rao &Nynalasetti DrG PVarma Saradhi International Journal of Advanced Research in Computer Science and Software Engineering 14 2014 * Comparative Study of Classification algorithms used for the Prediction of Non-communicable diseases HMVeena DSSuresh 10.30534/ijeter/2021/14972021 Int. J. Emerg. Trends Eng. Res 9 7 2021