# I. Introduction

engue fever (DF) is an arthropod-borne viral disease common past three decades. According to WHO, 51-101 million new infections with dengue occur every year in more than a hundred endemic countries [1]. Dengue fever is a severe viral infection with potentially fatal consequences. Dengue fever was originally known as "water poison." The dengue caused by the female Aedes aegypti mosquito is shown in Fig. 1 A Female Aedes Aegypti Mosquito

In the 1780s, the first clinically recognized epidemics of dengue occurred at the same time in Africa, Asia, and North America. Benjamin Rush was named "break-bone fever" based on the features of arthralgia and myalgia. The dengue epidemic was first reported in Chennai in 1780, the first virologically proven outbreak of dengue fever in India appeared at Calcutta and the East Coast of India in 1963-64. In the 1970s and 1980s, epidemic activity accelerated dramatically, resulting in the widespread of viruses and mosquito vectors and the consequent DENV transmission across the world [2]. The first major DHF epidemic occurred in the Philippines during 1953-1954, continued by a rapid global spread of DF/DHF epidemics. The first major DHF/DSS epidemics in India occurred in 1996, at Delhi and Lucknow, and later extended throughout the country. In India outbreaks of dengue have become more common in many parts. Between 2010 to 2014 incidence of reported cases of dengue was 34.81 per million population. Dengue fever became endemic in Orissa, Uttarakhand, Bihar, Assam, and Jharkhand, in 2010 [3]. 


# II. Background Study

Kassaye Yitbarek Yigzaw et al [2] presented a benchmarking platform for the prediction of communicable diseases. Rathi et al [4] studied dengue infection in Rajasthan. The study was based on 100 admitted children and he classified the patients based on their symptoms. Kalayanarooj S [3] demonstrates the clinical appearances of dengue and DHF. Aldallal, A.S [5] explained that data mining techniques are used for the prediction of non-communicable diseases like heart and diabetes. Agrawal et al [7] demonstrated the ensemble approach by using multiple classifiers Ada boost, and a decision tree for the prediction of diabetes. Ghosh et al [10] used multiple classifiers for the sentiment analysis performance assessment. Gupta et al [12] compared different ML approaches for heart disease prediction. Mesafint et al [14] explained ML algorithms for the prediction of HIV/AIDS tests.


# III. Proposed Methodology

The ensemble models are Extreme Gradient Boost (XGB), Random Forest (RF) by majority voting, and Stacking, which is based on a combination of heterogeneous classifiers like NB, KNN, and SVM. It is very helpful to consider ensemble techniques [6], for dengue fever diagnosis and prediction. The proposed framework is shown in Fig 3. The main aim of data acquisition and the data pre-processing module is to get the Dengue fever dataset and process them into a suitable form for further analysis. Datasets have features/attributes which will finally distinguish the data into patient sick and healthy. The dataset has thirty-eight features and different data types. The dataset is spitted into an 80% training set and a 20% testing dataset. The pre-processing includes feature selection and missing value imputation [8]. The proposed model combines different classifiers such as Naïve Bayes, K -Nearest Neighbor, and Support vector machine. For each classifier, the output is predicted.

Each base classifier is used in the ensemble framework by training data to make it useful for the prediction of dengue. Dataset features and target values are known to each classifier, which in turn can predict whether the disease is present or not.


# i. Description of the Dengue Dataset

The patient data is collected from the Department of General Medicine, PESIMSR, Kuppam, Andrapradesh. The patient is diagnosed in the laboratory using the dengue duo card test shown in fig 4. Dataset consists of 18 attributes and one target value.   The number of patients having each symptom is listed in Table I and corresponding bar charts explain the importance of each feature [9] are shown in fig. 7. Among 140 dengue-infected cases all the patients are suffering from fever,106 headache, 97 and 94 myalgia and arthralgia and 83 low back pain and others. 


# ii. XGBoost

Boosting is a broadly used and highly effective machine learning algorithm. An end-to-end tree boosting system called XGBoost is widely used by data experts. The important factor is its scalability for better accuracy. The system is ten times faster than existing conventional methods. The scalability of XGBoost is due to several algorithm optimizations. Parallel and distributed computing will make learning faster [15]. In the stacking algorithm, the base (first-level) classifiers are trained by the same set of the training sample, which is used to prepare the inputs for the meta (second-level) classifier, which may cause overfitting. The stackingCVclassifier uses the cross-validation method. The dataset is split into k folds, and k-1 folds are used to fit the level-1 classifier in k successive rounds. In every iteration, the level-1 classifiers are then applied to the remaining subset. The predictions of the base classifiers are then stacked and which is an input to the level-2 classifier.


# NO. OF PATIENTS


# IV. Performance Evaluation

The clinical dengue fever data set was used to analyse the performance of the ensemble model and to compare it with the other models. The class labels dengue infected (DF) with the dengue not infected (NDF) is replaced with class 1 and class 0 to maintain uniformity [16]. Each dataset is split into training and testing sets. Cross validations of 10-fold are applied. performance measure of each base classifier, as well as the ensemble model, is calculated using a confusion matrix. The base classifiers NB, SVM & KNN are trained first and then they are tested. The proposed research work analysed the performance of the ensemble methods XGB, RF, and Stacking. The metrics are accuracy, recall, precision, and f1-score. The confusion matrix illustrates the actual and predicted classification [15,17]. The equations ( 1), ( 2), (3), and ( 4), are used to calculate the metrics [17].    III and Fig. 11. The ensemble methods XGB, RF, and Stacking give 98.57%, 99.12%, and 99.56% for the training dataset, whereas 97.80%, 94.82% and 98.27% for the testing dataset. We observed better accuracy for ensemble methods.     IV. The AUC for the proposed ensemble XGB is 97.14% and 97.81% for random forest 98.14% and 99.14%, for stacking 98.14% and 98.68% for testing and Training datasets respectively. As shown in Table III, the AUC values for the datasets lie between 0.97 to 0.99, indicating that the positive class values are correctly distinguished from the negative class values.


# Table II: Confusion Matrix


# Actual


# Table V: Auc Comparision


# V. Conclusion

The main objective of this research work is to the prediction of dengue fever using ensemble techniques. We used bagging, boosting, and stacking methods for prediction and the end results are compared with the NB, KNN, and SVM models. The experimental results prove that Ensemble techniques are the best models for the prediction of dengue fever. The techniques were analysed using performance metrics. The accuracy for the extended boost, random forest with majority voting, and stacking using metaclassifiers gives better accuracy for both the training and testing datasets compared to other models. The extended analysis was done by using the roc curve and precision-recall curve, which explains the performance of the models. The Area under the curve lies between 0.97 to 0.99. The ensemble models are the better models for the prediction of dengue-infected patients. 
2![Fig. 2: Pictorial Representation of Dengue Fever Symptoms According to the World Health Organization, Dengue fever is classified into four types: DENV1, DENV2, DENV3, and DENV4. The incubation period is 2 to 7 days [4]. The Dengue symptoms are high fever, joint and muscle pain, headache, vomiting, rashes, pain behind the eyes, diarrhea, etc. The dengue fever symptoms are shown in Fig.2. Different ML algorithms are used for dengue fever classification such as NB classifier, K Nearest Neighbour, Decision Tree, Support Vector Machine, and Neural Networks. The proposed model demonstrates ensemble techniques called bagging, boosting, and stacking. The dengue binary classification is based on Extreme Gradient Boost (XGB), Random Forest by](image-2.png "Fig. 2 :")
3![Fig. 3: An Ensemble Frame Work for the Prediction and Evaluation of Dengue Dataset](image-3.png "Fig. 3 :")
4![Fig. 4: Diagnosis-Dengue Duo Card Test It consists of 286 instances with 18 attributes and one target. The target consists of dengue patients and Non dengue patients. levels. The numerical value is assigned for each level like 0 for non-dengue patients (NDF), and 1 for Dengue patients (DF). The screenshot of the dataset is shown in Fig.5.](image-4.png "Fig. 4 :")
5![Fig. 5: The screenshot of the dataset The target value consists of 140 cases of dengue infected and 146 non-dengue cases among 286 cases. The distribution is shown in Fig.6](image-5.png "Fig. 5 :")
6![Fig. 6: Distribution of a Target Value](image-6.png "Fig. 6 :")
89![Fig. 8: Random Forest Algorithm Procedure iii. StackingStacking is an ensemble technique, which uses meta-classifiers to learn, the possible way to combine two or more base ML algorithms predictions. The base or level 0 classifiers consists of different ML algorithms and therefore stacking ensembles are generally heterogeneous classifiers. Level 1 classifiers are used as new features to train a meta classifier. An ensemble stacking procedure is illustrated in fig 9.The meta classifier can be any classifier[13] ](image-7.png "Fig. 8 :Fig. 9 :")
![and experimental score of the NB, SVM, KNN, XGB, RF, and Stacking models training dataset and testing dataset are shown in Fig.10.](image-8.png "")
1011![Fig. 10:](image-9.png "Fig. 10 :Fig. 11 :")
13![Fig. 13: Testing Dataset Precision, Recall and F1 Score Comparison of ML Models The precision, recall, and f1 score for training and testing datasets are listed in Table IV and a comparison of an ensemble with other methods is shown in fig 12 and 13, which explains the ensemble methods give better performance for unseen data. The Receiver Operating Characteristic curve and the Precision-Recall curve is a graphical representation of a, by calculating and plotting the false positive rate (FPR) Vs the true positive rate (TPR) and precision Vs recall for each classifier at various threshold values. The precision and recall curve for both training and testing datasets is shown in fig .14 and fig.15 correspondingly the ROC curve is shown in Fig 16 and Fig 17.](image-10.png "Fig. 13 :")
14![Fig. 14: The Performance Comparison of the Training Dataset by Precision Recall Curve](image-11.png "Fig. 14 :")
1617![Fig. 16: The Performance Comparison of the Training Dataset by ROC Curve](image-12.png "Fig. 16 :Fig. 17 :")
![](image-13.png "")
![](image-14.png "")
![](image-15.png "")
![](image-16.png "")
ITarget200150100 50Year 2022Non DengueFig. 7: Bar Chart Representationb) Ensemble MethodsClinical Feature Ensemble means combining multiple models. This approach gives better performance compared to aNo. of PatientsFever single model. Thus, a set of models is used for Headache predictions than a single model [7]. The main challenge is to obtain a base model which gives different kinds of140 106( ) DMyalgia errors. If the ensemble technique of bagging, boosting,97Arthralgia and stacking are used for classification, high accuracies94Low Backache can be obtained. Bagging creates a different subset of83Retro Orb Pain training data from the sample training dataset & the final Rashes output depends on majority voting. e.g., Random Vomiting Forest. Boosting the creation of sequential models by Pain Abdomen combining weak learners with strong learners and the finally constructed model has the highest accuracy e.g.,71 65 57 41XGBOOST and ADA BOOSTBleeding39i.Cough30Diarrhea25Sore Throat16Breathlessnes6Seizures5© 2022 Global Journals
SVM RFTraining DatasetSVM RFTesting datasetNBNBAccuracy of Random Forest: 99.12Accuracy of Random Forest :Accuracy of Naive Bayes model: 95.40 precision recall f1-score 0 0.93 0.98 0.9 1 0.98 0.92 0.95 Accuracy of Support Vector Classifier: 97.5 precision precision recall f1-score recall f1-score 0 0.96 0.98 0.97 1 0.99 1.00 1.00 0 0.98 1.00 0.99 1 1.00 0.98 0.99Accuracy of Naivey bayes : 93.17 precision recall f1-score 0 0.94 0.98 0.96 1 0.97 0.93 0.95 94.82 Accuracy of Support Vector machine: precision recall f1-score 89.65 precision recall f1-score 0 0.91 0.98 0 0.97 0.98 0.97 0.94 1 0.98 0.89 0.93 1 0.98 0.96 0.XGBXGBKNNKNNAccuracy of Extreme Gradient BoostAccuracy of Extreme gradient Boost:98.57 precisionrecall f1-score:97.80 precisionrecall f1-score00.990.970.9800.970.980.9710.970.990.9810.980.960.97
Matrix and Experimental Results of Training and Testing Dataset of the Ensemble and Other M Models
IIIYear 202247( ) DClassifiersTraining DatasetTesting DatasetNB95.4093.17KNN96.4985.66SVM97.5189.65XGB98.5797.80RF99.1294.82Stacking99.5698.27© 2022 Global JournalsGlobal Journal of Computer Science and TechnologyVolume XXII Issue II Version I
IV1109097.580NBKNNSVMXGBRFYear 202248Volume XXII Issue II Version I ( ) D Global Journal of Computer Science and TechnologyClassifiers NB KNN SVM RF XGB Ensemble StackingTraining dataset Precision Recall (%) (%) NDF 93 98 DF 98 92 NDF 93 100 DF 100 93 NDF 96 98 DF 99 100 NDF 98 100 DF 100 98 NDF 99 97 DF 97 99 NDF 100 99 DF 99 100f1-score (%) 95 95 97 96 97 100 99 99 98 98 100 100Classifiers NB KNN SVM RF XGB EnsembleTesting Dataset Precision (%) NDF 94 DF 97 NDF 86 DF 96 NDF 91 DF 98 NDF 97 DF 98 NDF 97 DF 98 NDF 97Recall (%) 98 93 97 81 98 89 98 96 98 96 99f1-score (%) 96 95 91 88 94 93 97 97 97 97 98© 2022 Global Journals
Year 202250( ) DClassifierTesting DatasetTraining DatasetAuc_Nb0.96290.9514Auc_Knn0.83330.9342Auc_Svc0.94440.9956Auc_Xgb0.9781Auc_Rf0.98140.9914Auc_Scv0.98140.9868
		
		
## Acknowledgment

Our sincere thanks to Dr. Veerapuram Manoj Reddy, Department of General Medicine, PES Medical sciences and Research, Kuppam, Andrapradesh for his support for the collection of dengue data.

			
* 
	
		World Health Organization
		
		
			March 2014. 2014. 16 Oct 2019
			Geneva
		
	
	Fact sheet no


* 
	
		A communicable disease prediction benchmarking platform
		
			KassayeYigzaw
		
		
			JohanBellika
		
		BHI2014.564-568.10.1109/BHI.2014.6864427
	
	
		IEEE-EMBS International Conference on Biomedical and Health Informatics
				
			2014. 2014
		
	
* 
	
		Clinical Manifestations and Management of Dengue/DHF/DSS
		
			SKalayanarooj
		
		10.2149/tmh.2011-S10.Epub
		22500140
		PMC3317599
		
			2011 Dec. 2011 Dec 22
			39
			
		
	Trop Med Health. Suppl


* 
	
		STUDY OF DENGUE INFECTION IN RURAL RAJASTHAN
		
			Manisha&Rathi
		
		
			Masand
		
		
			AlokPurohit
		
		10.14260/jemds/2015/993
		
	
		Journal of Evolution of Medical and Dental Sciences
		
			2015
		
	
* 
	
		Using Data Mining Techniques to Predict Diabetes and Heart Diseases
		
			ASAldallal
		
		
			AAAl-Moosa
		
	
		4th International Conference on Frontiers of Signal Processing
				
			ICFSP
			2018. 2018
			
		
* 
	
		IntelliHealth: A medical decision support application using a novel weighted multi-layer classifier ensemble framework
		
			SabaBashir
		
		
			UsmanQamar
		
		
			Farhan Hassan Khan
		
		10.1016/j.jbi.2015.12.001
		
	
		Journal of Biomedical Informatics
		1532-0464
		
			59
			2016. Pages 185-200
		
	
* 
	
		Diabetes Diagnosis Prediction Using Ensemble Approach
		
			Agrawal
		
		
			GBhargav
		
		
			ESpandana
		
		10.1007/978-981-15-5546-6_66
		
			2021
		
	
* 
	
		
			Julia&Miao
		
		
			KathleenMiao
		
		
			2018
		
	
* 
	
		Cardiotocographic Diagnosis of Fetal Health based on Multiclass Morphologic Pattern Predictions using Deep Learning Classification
		11.10.14569/IJACSA.2018.090501
	
	
		International Journal of Advanced Computer Science and Applications
		
			9
		
	
* 
	
		Classification and Feature Selection Approaches by Machine Learning Techniques: Heart Disease Prediction
		
			ChandraReddy
		
		
			NSShue Nee
		
		
			SZhi Min
		
		
			LYing
		
		
			C
		
		10.11113/ijic.v9n1.210
		
	
		International Journal of Innovative Computing
		
			9
			1
			2019
		
	
* 
	
		Performance Assessment of Multiple Classifiers Based on Ensemble Feature Selection Scheme for Sentiment Analysis. Applied Computational Intelligence and Soft Computing
		12. 10.155/2018/8909357
		10. Ghosh, Monalisa & Sanyal, Prof(Dr.) Goutam.
		
			2018. 2018
		
	
* 
	
		Early heart disease prediction using hybrid quantum classification
		
			Heidari
		
		
			GerhardHanif & Hellstern
		
		10.48550/arXiv.2208.08882
		
			2022
		
	
* 
	
		Comparison of various machine learning approaches uses in heart ailments prediction
		
			Gunjan&Gupta
		
		
			U &Adarsh
		
		
			NReddy
		
		
			BSubba & Rao
		
		
			Ashwath
		
		2161.012010.10.1088/1742-6596/2161/1/012010
	
	
		Journal of Physics: Conference Series
				
			2022
		
	
* 
	
		Dengue Fever Prediction: A Data Mining Problem
		
			KShaukat
		
		
			NMasood
		
		
			SMehreen
		
		
			UAzmeen
		
		10.4172/2153-0602.1000181
	
	
		J Data Mining Genomics Proteomics
		
			6
			181
			2015
		
	
* 
	
		Grid search in hyperparameter optimization of machine learning models for prediction of HIV/AIDS test results
		
			Daniel & D HMesafint
		
		
			Manjaiah
		
		12.10.1080/1206212X.2021
	
	
		International Journal of Computers and Applications. 1
		
			63
			2021. 19746
		
	
* 
	
		Prediction of Dengue, Diabetes and Swine Flu Using Random Forest Classification Algorithm
		
			ATate
		
		
			UGavhane
		
		
			JPawar
		
		
			BRajpurohit
		
		
			GBDeshmukh
		
		
			2017
		
	
* 
	
		A hybrid Algorithm for Dengue Disease Prediction with Multi Dimensional Data
		
			Konadala Kameswara Rao &Nynalasetti
		
		
			DrG PVarma
		
		
			Saradhi
		
	
		International Journal of Advanced Research in Computer Science and Software Engineering
		
			14
			
			2014
		
	
* 
	
		Comparative Study of Classification algorithms used for the Prediction of Non-communicable diseases
		
			HMVeena
		
		
			DSSuresh
		
		10.30534/ijeter/2021/14972021
	
	
		Int. J. Emerg. Trends Eng. Res
		
			9
			7
			
			2021