# Introduction ccording to the World Health Organization (WHO), the number of people with diabetes had quadrupled since 1980. Prevalence is increasing worldwide, particularly in low-and middle-income countries. It is estimated that medical costs and lost work and wages for people diagnosed with diabetes is $327 billion yearly and twice as much as those who do not have diabetes (CDC, 2018). About 422 million people worldwide have the disease. It can lead to serious complications in any part of the body such as kidney disease, blindness, nerve damage, and heart disease (Temurtas et al., 2009),and increases the risk of dying prematurely -diabetes is the seventh leading cause of death worldwide. There are many factors to analyze to diagnose diabetes in a patient which makes the physician's job difficult. Thus, to save time, cost, and the risk of an inexperienced physician, classification models may be built to help predict and diagnose diabetes based on previous records (Polat et al., 2008). The use of machine learning in medicine has increased substantially. With the exponential growth of big data, manual efforts to analyze such data are impossible, therefore, automated techniques such as machine learning are used. Machine learning is defined as having the ability for a system to learn on its own, by extracting patterns from large raw data (Goodfellow et al., 2016). The General Regression Neural Network Oracle (GRNN Oracle), developed by Masters et al. in 1998, combines the predictions of individually trained classifiers and outputs one superior prediction by determining the error rate for each classifier form a set of observations in order to assign weights to favor classifiers with lower error rates. The final prediction for an unknown observation is calculated by summing each classifier's prediction for that unknown observation multiplied by the classifier's weight; the classifiers with lower error rates have greater influence on the final prediction. Because of the strong capabilities of the oracle, it has been enhanced to consist of two GRNN Oracles; one within the other. First proposed by Bani-Hani (2017), the first oracle is created through its own combination of algorithms and acts now as a classifier as it has its own predictions and error contribution to a set of unknown observations. It is then combined with other classifiers to create a new, outer oracle that has been named the Recursive General Regression Neural Network Oracle (R-GRNN Oracle). This study is applied on the Pima Indians Diabetes dataset where Genetic Algorithm (GA) is used for feature selection and hyperparameter optimization, and the proposed classifier, the Recursive General Regression Neural Network Oracle (R-GRNN Oracle), is applied along with seven other classifiers, namely Support Vector Machine (SVM), Multilayer Perceptron (MLP), Random Forest (RF), Probabilistic Neural Network (PNN), Gaussian Naïve Bayes (GNB), K-Nearest Neighbor (KNN), and the GRNN Oracle, for the prediction and diagnosis of diabetes. The R-GRNN Oracle was able to achieve the highest accuracy and AUC (area under the Receiver Operating Characteristic (ROC) curve) performance metrics in comparison to the other classifiers used. The remainder of this paper is organized as follows: Section 2 presents the related work regarding this study. Section 3 explains the methodology adopted in this study. Section 4 shows the experimental analysis and results. Section 5 presents the discussion. And Section 6 presents the conclusion and future work. # II. # Related Work Prediction models are vastly implemented in clinical and medical fields to support diagnostic decision-making (Zheng et al., 2015 Many other studies have been carried out on the same dataset, however, due to reporting training accuracies rather than testing and validation accuracies, they have been excluded from the literature review for several reasons including, and most importantly, overfitting, as overfitting generates higher accuracies due to fitting the model too perfectly to the training set making the model not generalized enough. The other studies that have been excluded are those that obtained high accuracies but did not mention whether they obtained it from a training set or a testing or validation set making the results questionable. It is worthy to note that this study applied 4-fold cross validation to train each classifier and were tested on a validation subset that did not take part in neither the training nor testing steps. # III. # Methodology Six individual classifiers were used in this research: SVM, MLP, RF, PNN, GNB, and KNN, in which some were used to create the GRNN Oracle, and some were combined with the first oracle to create the R-GRNN Oracle. The software and language used for this study was Python 3.6 and the hardware specifications were Intel® Core? i7-8750H CPU @ 2.20GHz with 32.0 GB RAM. a) Individual Classifiers Support Vector Machine: SVM is a statistical learning method proposed by Vapnik (1995). It is a widely used supervised machine learning algorithm used for both classification and regression. SVM works by finding the hyperplane that maximizes the margin between the classes in the feature space, as seen in Figure 1. Support vectors are observations that help dictate the hyperplane. It classifies new samples based on which side of the boundary they are located on. Multilayer Perceptron: MLP is a feed forward artificial NN that is a modification of the standard linear perceptron. It is an algorithm that does not require a linear relationship between the independent variables and the dependent variable as it is able to solve problems that are not linearly separable through the use of activation functions located in each node. An MLP consists of an input layer, a hidden layer(s), and an output layer. It is a supervised machine learning algorithm that exploits back propagation to train itself to optimize the weights of each edge connecting two nodes. It is the most frequently used NN (Hossain et al., 2017) and is widelyused for classification, regression, recognition, prediction, and approximation tasks. Figure 2 illustrates an example of anMLP with one hidden layer with five hidden nodes. Gaussian Naïve Bayes: GNB is a supervised learning algorithm that is widely used for classification problems because of its simplicity and accurate results (Farid et al., 2014). It uses Bayes theorem as its framework (Griffis et al., 2016) and has strong independence assumptions between the independent variables. One important advantage of GNB is that it could estimate the parameters necessary for classification by training on a small training set. K-Nearest Neighbor: KNN is a non-parametric, lazy learning method for classification and regression tasks (Zhang, 2016). ??is a user-set parameter that represents the number of known observations closest to the unknown observation mapped out in the feature space. For classification tasks, the class of the new observation is based on the majority class surrounding it; ?? is typically an odd number. For regression tasks, the new observation is taken as the average of its ?? neighbors. # b) Optimization Algorithms Genetic Algorithm: GA is a population-based metaheuristic developed by John Holland in the 1970s (Holland, 1992) # c) GRNN Oracle The GRNN Oracle combines the predictive powers of several machine learning classifiers that were trained independently to form one superior prediction (Li, 2014). It determines the error rate for each classifier involved in the oracle in order to assign weights to favor classifiers with lower error rates. The final prediction for an unknown observation is calculated by summing each classifier's prediction for that unknown observation multiplied by the classifier's weight. The steps involved in predicting a class (output) for a single observation are: first, each classifier (??) is trained on a training subset of the data and tested on another subset to obtain predictions for the observations. Second, each prediction obtained from the previous step (probability of belonging to each class) for each observation is compared to its actual prediction (actual class) and the Mean Squared Error (MSE) is calculated through Formula 1: ?????????? ??,?? = ? ????? ?? ? ???? ?? ,?? ? 2 /??????_?????????????? ?????? _?????????????? ?? =1(1) where ?????????? ??,?? is the mean squared error of a known observation (??) from classifier (??), ??????_?????????????? is the total number of classes, ???? ?? is the actual probability of the known observation (??) for being class (??) and ???? ?? ,?? is the predicted probability of being class (??) from classifier (??).Third, for a given unknown observation in the validation set (an observation that needs to be predicted), the distance between the observation and all the known samples in the testing set is calculated, and each known observation has a particular weight for the unknown observation. The distance is calculated using Formula 2 and the weight is calculated using Formula 3. ??(?? ?, ?? ? ?? ) = 1 ?? ? ((?? ?? ? ?? ???? )/?? ?? ) 2 ?? ?? =1 (2) ??????????? ?? = ?? ???(?? ?,?? ? ?? )(3) where ?? ? represents the vector of features belonging to the unknown observation, [feature 1, feature 2, ?, feature ??], ?? ? ?? is the feature vector for the known observation, ?? ?? is the ??-th feature of the unknown observation, ?? ???? is the ??-th feature of the known observation, ?? ?? is an adjustable sigma parameter for the ??-th feature and ?? is the total number of features. ?? ?? is the weight (trust) of classifier (??) on the prediction of the unknown observation. Fourth, for the unknown observation, for each classifier (??), the predicted squared error is obtained through the MSE and weight of each known observation (Formula 4). ?????????? ?? (?? ?) = (? ?????????? ??,?? * ??????????? ?? ?? ??=1 )/ ? ??????????? ?? ?? ??=1 (4) Fifth, each classifier (??) has an amount of trust for the final prediction of the unknown observation where the higher the weight, the more influence it has on the final prediction of the unknown observation (Formula 5). ?? ?? = (1/?????????? ?? )/(? 1/?????????? ?? ?? ??=1 ) (5) ? ?? ?? ?? ??=1 = 1 (6) where?? is the total number of classifiers, and ?? indicates classifier ??. The sum of ?? ?? for all classifiers (??) equals one (Formula 6). Lastly, through the amount of error each classifier (??) contributes, their trust/weight is multiplied by the unknown observation's prediction and summed up to form the final prediction for that particular unknown observation (Formula 7). ?? ? = ? ?? ?? * ?? ?? ?? ??=1(7) where?? ? is the prediction of the unknown observation outputted by the GRNN Oracle represented as a class membership vector and ?? ?? is the predicted class membership vector for the unknown observation given by classifier (??). # d) Recursive GRNN Oracle The best combination of classifiers that were trained and tested individually and independently was used to make the first oracle. By having predictions outputted from the oracle, it now acts as any other machine learning classifier would. The best combination of classifiers that would enhance the performance of the first GRNN Oracle is selected and this selected combination, including the first oracle, creates the second oracle, the R-GRNN Oracle. The accuracy, AUC, sensitivity, and specificity of its final predictions are taken, along with the same performance metrics of the inner GRNN Oracle and the individual classifiers for the final comparison. # IV. # Experimental Analysis and Results # a) Dataset Description The Pima Indians Diabetes dataset was used in this study where it was originally a study conducted by the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) on the Pima Indian population near Phoenix, Arizona, in 1965 (Smith et al., 1988). There is a total of 768 observed patients where 268 of them have diabetes, which indicates the imbalanced property of the dataset. In this dataset, there are eight independent variables (features) and one dependent variable (outcome: diabetes or no diabetes), as presented in Table 1. More detailed attributes distributions and statistical analysis are further shown in Figure 3, where the color orange signifies patients who have diabetes. All patients recorded are females at least 21 years old of Pima Indian heritage. # b) Data Preprocessing The first step taken in the data preprocessing phase was excluding outliers as they can drastically affect the model's predictive ability. Any point that was three standard deviations (3??) away from the mean of any given feature was excluded. The original dataset had 768 patients, and after outlier removal, the new dataset contained 709 patients where 243 of them had diabetes. The next step was to correct the imbalanced property of the data. Since only 243 patients from the remaining 709 had diabetes, this is a class imbalance problem where those with diabetes only make up 34% of the data. Thus, an oversampling approach was applied to the minority class. Oversampling was favored over under sampling because the dataset's size concerning the number of observations was already small, and concerning how the R-GRNN Oracle works, it would not be a wise approach to remove observations, as the recursive oracle requires the dataset to be relatively large for the data subsets to be drawn. After this step, a normalization technique was applied to each independent variable in which the variable was scaled to a range between 0 and 1; 0 indicating the lowest value in a particular feature and 1 indicating the highest. The formula of normalization is given in Formula 8 where min ?? ?? is the minimum value in the set of values in feature ?? ?? , and max ?? ?? is the maximum value in feature ?? ?? . This is performed to ensure each feature has an equal weight so that no one feature would outweigh another before the creation of the prediction model. ??? ?? = (?? ?? ? min ?? ?? )/(max ?? ?? ? min ?? ?? ) (8) # c) Hyperparameter Optimization The hyperparameters in any algorithm contributes greatly to the output of the model, therefore, determining the optimal (or near-optimal) combination of hyperparameters would yield the best result. For example, some of a NN's hyperparameters include the number of hidden layers a user sets and the number of hidden nodes in each hidden layer. Hyperparameters are defined as the properties of a model that the user can set the value to. They are different from parameters as parameters are changed internally by the model itself during training rather than set by the user before the training process. An example to a parameter is the weights of a NN, as they are adjusted through back propagation using Gradient Decent (or any other optimizer) rather than by the user. GA was utilized to optimize the performances of the SVM and MLP, while GS was applied on KNN and RF. The reason that GS was used instead of GA was that both KNN and RF have one parameter of interest: the number of neighbors and the number of DTs, receptively. Therefore, no combinations of hyperparameters are needed which makes it a straightforward exhaustive search. SVM and MLP however have more than one hyperparameter that need to be optimized simultaneously, which also include continuous values, this is why GA is used. Formula 9 shows the fitness function (????) used to evaluate each chromosome (each solution). They were evaluated based on their prediction accuracies, where ???? is the true positive rate, in which it indicates those who actually have diabetes and were predicted to have diabetes, ???? is the true negative rate where those who do not have diabetes were predicted not having diabetes, ???? is the false positive rate in which those without diabetes were falsely predicted that they do have diabetes, ???? is the false negative rate, where patients have diabetes but were falsely predicted that they don't, and ?? is the number of folds required for the K-fold cross validation, in which it was set to four for this study. ???? = 1 ?? ?? ????+???? ????+????+????+???? ?? ??=1 ? (9) The hyperparameters that were included in this study relating to SVM were ?? and gamma (??), where both take on continuous values, while MLP's hyperparameters included the learning rate (??), momentum, the number of hidden layers, the number of hidden nodes in each hidden layer, and the solver, where ?? and momentum are continuous, the number of hidden layers and nodes are integers, and the solver is categorical. Figure 4 and Figure 5 show the encoding (genotype) for the SVM and MLP parameters, respectively, where each continuous hyperparameter was encoded with a binary chromosome with a length of 15 alleles. SVMs can handle nonlinear classifications through transforming inputs into feature vectors with the use of kernels. The SVM kernel set for this study is the Radial Basis Function (RBF) in which it is a popular Gaussian kernel function. Some of RBF's greatest advantages are its high accuracy, its fast convergence, and its applicability in almost any dimension. ??is a regularization hyperparameter that determines how correctly the hyperplane between the classes separates the data. It controls the trade-off between model complexity and training error (Joachims, 2002). ?? has a serious impact on the classification accuracy as it defines the influence of each training observation (Tuba and Stanimirovic, 2017), with lower values meaning "far" and higher values meaning "close". It can be thought of as the inverse of the radius of influence of observations selected by the model as support vectors. Figure 6-A shows SVM's accuracy achieved by GA in each of the 100 generations run. With MLP, the activation function set in this study was the Rectified Linear Unit (ReLU). Activation functions are operations which map an output to a set of inputs. They are used to impart non-linearity to the network structure (Acharya et al, 2017). Because ReLU returns a positive number, i.e. ?????? (??, 0), the two major advantages of it are sparsity and the reduced likelihood of the "vanishing gradient" problem, as adding as many hidden layers as one would like would not cause the gradient multiplication to reach a very small number that it will likely "vanish" with more layers to add. Solvers in NNs train and optimize the weights connecting the nodes between two-adjacent layers. The two solvers considered for this study are Stochastic Gradient Decent (SGD) and Adam, a variant of SGD. The other two important hyperparameters are ??and momentum. ?? controls how fast the network learns during training andmomentum helps to converge the data (Acharya et al, 2017). They can be thought of the stepping size and direction in the search space. Feature selection plays an important role in classification for several reasons (Luukka, 2011). First, it can simplify the model's complexity which helps reduce computational cost, and when the model is taken for practical use fewer inputs are needed. Second, by removing redundant features from the dataset one can also make the model more transparent and more comprehensible, providing better explanation of suggested diagnosis, which is an important requirement in medical applications. Feature selection process can also reduce noise in which it may enhance classification accuracy. GA was applied for feature selection through SVM and its optimized hyperparameters from the previous step. The solution representation for feature selection was embodied by a chromosome of eight binary values (i.e. 0's and 1's). An allele of the value 0 indicates that feature ?? was not included while an allele of 1 indicated that it was included; ?? is the ????? feature in the dataset. To explain further, Figure 7 illustrates an example where the encoding of the selected features #1, #2, #3, and #6, out of eight featuresis shown. Chromosomes with a subset of features selected are then evaluated based on their accuracy. The subset of features that attained the highest accuracy was selected for further analysis. Formula 9 was also used as the fitness function for chromosome evaluation. For the first GRNN Oracle (the inner oracle), the classifiers fed into it were SVM, GNB, and RF. The accuracy and AUC for SVM were 79.72% and 85.79%, respectively. GNB had 79.09% and 85.56%, respectively, and RF at 77.50% and 81.15%. The performance of the first oracle had an accuracy of 79.54%, AUC of 85.16%, sensitivity of 59.60%, and specificity of 88.51%. MLP, PNN, and KNN were not chosen because of their inferior performances when compared to the other models. All models were run 15 times and the average of the performance metrics were taken. For the R-GRNN Oracle, the first GRNN Oracle, which now acts as a classifier with its own predictions, was combined with SVM. Since SVM had a better performance than others, itwas chosen as a match with the first oracle to create the second oracle. Figure 8 illustrates the classifiers being fed into each one of the two oracles. The first oracle achieved an accuracy of 79.54%, however, it was surpassed by SVM (79.72%), but the recursive model had the highest accuracy at 81.14% and highest AUC at 86.03%. Although it was able to reach the highest sensitivity too (63.80%) in comparison to the rest, MLP had the highest sensitivity (89.71%), where the recursive model came in third with 89.14% after MLP and SVM. However, since detecting TPs is of great importance (those who have diabetes), the sensitivity metric, where the R-GRNN was the highest, has a higher significance than specificity. Table 2shows the accuracy, AUC, sensitivity, and specificity of all the classifiers: six individual classifiers (performing on their own), the GRNN Oracle, and the R-GRNN Oracle. The performances can also be seen in Figure 9 and Figure 10. Figure 9 shows the recursive model's 15 runs where the best, average, and worst performance were recorded, 86.47%, 81.14%, and 76.15%, respectively. It is worthy to mention that the dataset was shuffled each time the classifiers were run to ensure the robustness of the model, as no matter how it the data was shuffled, it always yielded better performance than the rest of the classifiers. Shuffling the data is the reason behind the high variation seen in Figure 9. Also, as a reminder to what was mentioned earlier, 4-fold cross validation was applied to train and test the models, but the actual Year 2 019 ( ) D validation of each model was applied on a validation subset that was not involved in neither the training nor testing steps of each model. # Discussion While the accuracy of the proposed model was not the highest in the literature, it still came in third when compared to all the publications studied (Table 3). It also bested the traditional oracle, SVM, MLP, RF, PNN, GNB, and KNN. However, as a slight remark, the studies did not confirm whether their accuracies were from conducting several runs and taking the average or not. As in this study, the highest accuracy achieved by the recursive model was 86.47%; one could simply report it as the highest achieved, therefore, it is wise if several runs are conducted and the average was taken. # Conclusion and Future Work This study presented the R-GRNN Oracle and was applied on the Pima Indians Diabetes dataset. It was applied along with seven other classifiers in which their final performances were compared. The other classifiers included are the traditional GRNN Oracle, SVM, MLP, RN, PNN, GNB, and KNN. GA was used to optimize the hyperparameters of SVM and MLP, and GS was used on RF and KNN. The models were run 15 times and the dataset was shuffled each run to ensure robustness. 4-fold cross validation was adopted as the validation method. Compared to the other models, the recursive oracle achieved the highest accuracy, AUC, and sensitivity at 81.14%, 86.03%, and 63.80%, respectively. It, however, came in third for specificity at 89.14% where optimized MLP had the highest at 89.71%. Future research may include applying feature selection and hyperparameter optimization simultaneously rather than applying feature selection based on the optimized hyperparameters from all the features. It can also include using other metaheuristics, such as Particle Swarm Optimization (PSO) for hyperparameter optimization. # References Références Referencias 1![Figure 1: A simple linear SVM](image-2.png "Figure 1 :") 2![Figure 2: AnMLP NN with one hidden layer (Mohamed et al. 2015)](image-3.png "Figure 2 :") 3![Figure 3: Attributes distributions and statistical analysis of the Pima Indians Diabetes dataset (printed in color)](image-4.png "Figure 3 :") 45![Figure 4: Chromosome encoding of the SVM parameters](image-5.png "Figure 4 :Figure 5 :") ![Figure 6-B shows MLP's accuracy achieved by GA in each of the 100 generations run.](image-6.png "") 6![Figure 6: Prediction accuracy throughout 100 generations, A-SVM, B-MLP d) Feature SelectionFeature selection plays an important role in classification for several reasons (Luukka, 2011). First, it can simplify the model's complexity which helps reduce computational cost, and when the model is taken for practical use fewer inputs are needed. Second, by removing redundant features from the dataset one can also make the model more transparent and more comprehensible, providing better explanation of suggested diagnosis, which is an important requirement in medical applications. Feature selection process can also reduce noise in which it may enhance classification accuracy.GA was applied for feature selection through SVM and its optimized hyperparameters from the previous step. The solution representation for feature selection was embodied by a chromosome of eight binary values (i.e. 0's and 1's). An allele of the value 0 indicates that feature ?? was not included while an allele of 1 indicated that it was included; ?? is the ????? feature in the dataset. To explain further, Figure7illustrates an example where the encoding of the selected features #1, #2, #3, and #6, out of eight featuresis shown. Chromosomes with a subset of features selected are then evaluated based on their accuracy. The subset of features that attained the highest accuracy was selected for further analysis. Formula 9 was also used as the fitness function for chromosome evaluation.](image-7.png "Figure 6 :") 7![Figure 7: Chromosome encoding of a selected subset of features e) Recursive GRNN OracleFor the first GRNN Oracle (the inner oracle), the classifiers fed into it were SVM, GNB, and RF. The accuracy and AUC for SVM were 79.72% and 85.79%, respectively. GNB had 79.09% and 85.56%, respectively, and RF at 77.50% and 81.15%. The performance of the first oracle had an accuracy of 79.54%, AUC of 85.16%, sensitivity of 59.60%, and specificity of 88.51%. MLP, PNN, and KNN were not chosen because of their inferior performances when compared to the other models. All models were run 15 times and the average of the performance metrics were taken.](image-8.png "Figure 7 :") 8![Figure 8: Overview of the Recursive GRNN Oracle](image-9.png "Figure 8 :") 9![Figure 9: The best, worst, and average performances of the R-GRNN Oracle in 15 runs](image-10.png "Figure 9 :") 10![Figure 10: Graphical representation for the performance metrics for the classifiers V.](image-11.png "Figure 10 :") achieve the highest accuracy of 78.4% using a two-layerMLP. A hybrid of Artificial Neural Network (ANN) andFuzzy Neural Network (FNN) was proposed byKahramanli and Allahverdi in 2008. Their approachresulted in an accuracy of 84.2%.Lekkas and Mikhailove(2010) applied Evolving Fuzzy Classification (EFC) totwo datasets including Pima Indians Diabetes dataset.They were able to reach an accuracy of 79.37%.Miche etal. (2010) presented the Optimally Pruned ExtremeLearning Machine (OP-ELM) and compared itsperformance to a MLP, SVM, and Gaussian Process(GP) on several regression and classification datasets.Regarding the dataset concerning this study, the GPhad the highest accuracy among the classifiers testedwith an accuracy of 76.3%. Huang et al. (2004) was ableto achieve an accuracy of 77.31% using SVM, althoughtheir paper proposed an algorithm called ExtremeMachine Learning (EML). Kumari and Chitra (2013) usedSVM and obtained an accuracy of 78.2%. Al Jarullah(2011) also found the accuracy to be 78.2% usingDecision Trees (DTs). Bradley and Mangasarian (1998)applied Feature Selection via Concave (FSC), SVM, andRobust Linear Program (RLP) in which the RLP had thehighest accuracy on the Pima Indian Diabetes dataset at76.16%. Using a novel Adaptive Synthetic (ADASYN)sampling approach, He et al. (2008) achieved anaccuracy of 68.37%.?ahan et al. (2005) proposed a newartificial immune system named Attribute WeightedSquare Support Vector Machine (LS-SVM) for the prediction of diabetes through Generalized Discriminant Analysis (GDA). Park and Edington (2001) applied sequential multi-layered perceptron (SMLP) with back propagation learning on 6,142 participants. The early detection of diabetes type II was conducted by Zhu et al. in 2015 in which they proposed a dynamic voting scheme ensemble. Thirugnanam et al. (2012) adopted techniques such as fuzzy logic, Neural Network (NN), and case-based reasoning as an individual approach (FNC) for the diagnosis of diabetes. Regarding the dataset used in this study, theArtificial Immune System (AWAIS) in which they attained a classification accuracy of 75.87%.Luukka (2011) used Similarity-Based (S-Based) classifier with fuzzy entropy measures as a feature selection method and reached an accuracy of 75.97%. Using Extreme Gradient Boosting (XGBoost), Christina et al. (2018) achieved 81% accuracy. Ramesh et al. (2017) used deep learning, more specifically Restricted Boltzmann Machine (RBM), on the dataset with 81% accuracy. Vaishali et al. (2017) applied GA for feature section with a Multi Objective Evolutionary Fuzzy (MOEF) classifier and obtained an accuracy of 83.04%.Pima Indian Diabetes dataset, various studies used thedataset to create prediction models for the predictionand diagnosis of diabetes. Kayaer and Yildirim (2003)applied an MLP, Radial Basis Function (RBF), and aGeneral Regression Neural Network (GRNN) on thePima Indian Diabetes dataset. Their highest accuracywas achieved by the GRNN at 80.21%. Carpenter andMarkuzon(1998) applied several techniques on thedataset including, but not limited to, KNN, LogisticRegression (LR), the perceptron-like ADAP model,ARTMAP, and ARTMAP-IC (named for instance countingand inconsistent cases), in which the ARTMAP-ICobtained the highest accuracy at 81%. Bradley (1997)also used various classifiers on the dataset where theauthor's main purpose was to assess the use of theAUC as a performance metric. The author was able to . It is a widely used optimizationtechnique inspired by nature, more specifically,evolution and survival of the fittest. It finds solutionsthroughout the search space using two main operators:crossover and mutation. Every solution is representedas a chromosome with several alleles encoded withgenetic material that measure the fitness value of theobjective function. Crossover produces two somewhat different chromosomes, called offspring, from two parents. The mutation operator is applied on the offspring at a given probability to create diversity in the Grid Search: 1DescriptionTypeX1 No. of PregnanciesDiscreteX2 Plasma Glucose ConcentrationContinuousX3 Diastolic Blood PressureContinuousX4 Skin ThicknessContinuousX5 2-hr Serum InsulinContinuousX6 BMIContinuousX7 Diabetes Pedigree FunctionContinuousX8 AgeContinuousYOutcome: Diabetes/No DiabetesDiscrete 2AccuracyAUCSensitivitySpecificitySVM79.7285.7958.4389.31MLP76.8880.7549.1189.71RF77.5081.1557.1186.65PNN71.0375.5461.4375.24GNB79.0984.5660.4487.58KNN76.5980.7758.7284.53GRNN O.79.5485.1659.6088.51R. GRNN O.81.1486.0363.8089.14 3MethodAccuracy voicedata.In AdvancedCommunicationTechnology (Disease NeuroimagingInitiative.(2017).Classification of Alzheimer's disease and predictionofmildcognitiveimpairment-to-Alzheimer'sconversion from structural magnetic resourceimaging using feature ranking and a geneticalgorithm. Computers in biology and medicine, 83,109-119.7. Belgiu, M., & Dr?gu?, L. (2016). Random forest inremote sensing: A review of applications and futuredirections. ISPRS Journal of Photogrammetry andRemote Sensing, 114, 24-31.8. Bhardwaj, A., & Tiwari, A. (2015). Breast cancerdiagnosis using genetically optimized neuralnetworkmodel. ExpertSystemswithApplications, 42(10), 4611-4620.9. ( ) D © 2019 Global Journals © 2019 Global Journals * Random forests LBreiman Machine learning 45 1 2001 * ARTMAP-IC and medical diagnosis: Instance counting and inconsistent cases GACarpenter NMarkuzon Neural Networks 11 2 1998 * Centers for Disease Control and Prevention. Diabetes quick facts * Decision Support System for a Chronic Disease-Diabetes SSChristina NSantiago 2018 * Leukemia prediction from gene expression data-a rough set approach JFang JWGrzymala-Busse International conference on artificial intelligence and soft computing Berlin, Heidelberg Springer 2006. June * Hybrid decision tree and naïve Bayes classifiers for multi-class classification tasks DMFarid LZhang CMRahman MAHossain RStrachan Expert Systems with Applications 41 4 2014 * Diagnosing Parkinson by using artificial neural networks and support vector machines DGil DJManuel Global Journal of Computer Science and Technology 9 4 2009 * IGoodfellow YBengio ACourville YBengio Deep learning Cambridge MIT press 2016 1 * Voxel-based Gaussian naïve Bayes classification of ischemic stroke lesions in individual T1-weighted MRI scans JCGriffis JBAllendorfer JPSzaflarski Journal of neuroscience methods 257 2016 * Individual detection of patients with Parkinson disease using support vector machine analysis of diffusion tensor imaging data: initial results SHaller SBadoud DNguyen VGaribotto KOLovblad PRBurkhard American Journal of Neuroradiology 2012 * ADASYN: Adaptive synthetic sampling approach for imbalanced learning HHe YBai EAGarcia SLi IEEE International Joint Conference on IEEE 2008. June. 2008. IJCNN 2008 Neural Networks * Adaptation in natural and artificial systems: An introductory analysis with applications to biology, control, and artificial intelligence JohnHolland Henry 1992 MIT press * A comparative study of vibrational responsebased impact force localization and quantification using radial basis function network and multilayer perceptron MSHossain ZCOng ZIsmail SYKhoo Expert Systems with Applications 85 2017 * A practical guide to support vector classification CWHsu CCChang CJLin 2003 * Extreme learning machine: a new learning scheme of feed forward neural networks GBHuang QYZhu CKSiew IEEE International Joint Conference on IEEE 2004. July. 2004. 2004 2 Neural Networks * Comparison of Bayesian network and support vector machine models for two-year survival prediction in lung cancer patients treated with radiotherapy KJayasurya GFung SYu CDehing-Oberije DDe Ruysscher AHope WDe Neve YLievens PLambin AL A JDekker Medical physics 37 4 2010 * Learning to classify text using support vector machines: Methods, theory and 36 TJoachims TManninen HHuttunen PRuusuvuori MNykter PloS one 8 8 e72932 2002. 2013 Leukemia prediction using sparse logistic regression * An oracle based on the general regression neural network. In Systems, Man, and Cybernetics TMasters WHLand SManiccam IEEE International Conference on IEEE 1998. October. 1998. 1998 2 * OP-ELM: optimally pruned extreme learning machine YMiche ASorjamaa PBas OSimula CJutten ALendasse IEEE transactions on neural networks 21 1 2010 * Assessment of artificial neural network for bathymetry estimation using High Resolution Satellite imagery in Shallow Lakes: case study El Burullus Lake HMohamed ANegm MZahran OCSaavedra International water technology conference 2015 * JPark DWEdington 2001 * A cascade learning system for classification of diabetes disease: Generalized discriminant analysis and least square support vector machine. Expert systems with applications KPolat SGüne? AArslan 2008 34 * Optimal predictive analytics of pima diabetics using deep learning SRamesh HBalaji NC SIyengar RD&caytiles diabetes 9 10 2017 * Computer-aided diagnosis of Alzheimer's type dementia combining support vector machines and discriminant set of features JRamírez JMGórriz DSalas-Gonzalez ARomero MLópez IÁlvarez MGómez-Río Information Sciences 237 2013 * The medical applications of attribute weighted artificial immune system (AWAIS): diagnosis of heart and diabetes diseases S?ahan KPolat HKodaz SGüne? International Conference on Artificial Immune Systems Berlin, Heidelberg Springer 2005. August * Diagnosis by volatile organic compounds in exhaled breath from lung cancer patients using support vector machine algorithm YSakumura YKoyama HTokutake THida KSato TItoh TAkamatsu WShin Sensors 17 2 287 2017 * Using the ADAP learning algorithm to forecast the onset of diabetes mellitus JWSmith JEEverhart WCDickson WCKnowler RSJohannes Proceedings of the Annual Symposium on Computer Application in Medical Care the Annual Symposium on Computer Application in Medical Care 1988. November 261 * American Medical Informatics Association * Comparative evaluation of support vector machines for computer aided diagnosis of lung cancer in CT based on a multi-dimensional data set TSun JWang XLi PLv FLiu YLuo YGao QZhu XGuo Computer methods and programs in biomedicine 111 2 2013 * A comparative study on diabetes disease diagnosis using neural networks HTemurtas NYumusak FTemurtas Expert Systems with applications 36 4 2009 * Improving the prediction rate of diabetes diagnosis using fuzzy, neural network, case based (FNC) approach. Procedia engineering MThirugnanam PKumar SVSrivatsan CR&nerlesh 2012 38 * Elephant herding optimization algorithm for support vector machine parameters tuning ETuba ZStanimirovic Electronics ECAI 2017. June 2017 * 9th International Conference on IEEE * Genetic algorithm based feature selection and MOE Fuzzy classification algorithm on Pima Indians Diabetes dataset RVaishali RSasikala SRamasubbareddy SRemya SNalluri Computing Networking and Informatics (ICCNI IEEE 2017. October 2017 International Conference on * The nature of statistical learning theory VVapnik 2013 Springer science & business media * World Health Organization. 10 facts on diabetes * Comparison of machine learning methods for stationary wavelet entropy-based multiple sclerosis detection: decision tree, k-nearest neighbors, and support vector machine YZhang SLu XZhou MYang LWu BLiu PPhillips SWang Simulation 92 9 2016 * Breast cancer diagnosis based on feature extraction using a hybrid of K-means and support vector machine algorithms BZheng SWYoon SSLam Expert Systems with Applications 41 4 2014 * Predictive modeling of hospital readmissions using metaheuristics and data mining BZheng JZhang SWYoon SSLam MKhasawneh SPoranki Expert Systems with Applications 42 20 2015 * An improved early detection method of type-2 diabetes mellitus using multiple classifier system JZhu QXie KZheng Information Sciences 292 2015