# Introduction

lection is important because it allows the electorate to decide who's going to make decision for their country for the next couple of years. But this election can be forecasted with a reasonable accuracy. Forecasting election using small polling system is very common approach but this often do not produce reasonable accuracy.

Data mining is a process that examines large preexisting databases in order to generate new information. There are also various works that uses data mining approaches to predict various types of results such as weather forecasting, sports result prediction, future buying decision prediction, etc. But there are very few works that uses data mining approaches to predict voting patterns on election. In this work, we uses data mining approaches to predict voting patterns in USA election. For this study we uses data preprocessing for removing missing value, identifying best attributes and removing duplicate values. We split the dataset into training datasets and test datasets. Then we applied four algorithms Tree J48, Naïve Bayes Classifier, Trees Random Forest and Rules zero or Classifier for predicting voting patterns and also compares the results of those model and finds the best models from those models.


# II.


# Related Works

Gregg R. Murray and Anthony Scime uses data mining approaches to predict individual voting behavior including abstention with the intent of segmenting the electorate in useful and meaningful ways [1]. Gregg R. Murray, Chris Riley, and Anthony Scime, in another study, uses iterative expert data mining to build a likely voter model for presidential election in USA [2]. Bae, Jung-Hwan, Ji-Eun, Song, Min uses Twitter data for predicting trends in South Korea Presidential Election by Text Mining techniques [3]. Tariq Mahmood, TasmiyahIqbal, Farnaz Amin, WaheedaLohanna, Atika Mustafa uses Twitter data to predict 2013 Pakistan Election winner [4].


# III.

Data Preprocessing 


# Experimental Methodology

We used 4 algorithms and 8 models (2 models for each algorithm) to predict the voting pattern in the US election. We then analyse and compare the results of those models and finds the best models with most accuracy. The algorithms which are applied for generating models are given below.

i. Trees J48 ii.

Naive   From the above table, the best model was identified based on the value of the parameters accuracy, precision, recall, sensitivity, and specificity. The higher the value of accuracy, precision, recall and (sensitivity> specificity), the higher the rank.


# VI.


# Conclusion

Though there are lot of techniques and methods for predicting voting patterns, data mining is the most efficient and effective methods in this fields. In our study, we clearly found that among various data mining algorithms Trees Random Forest performs the best with 98.17% accuracy. In future, we will expand our research in most recent dataset for validating our findings with recent ones. 
1Year 2 01937EI. Handling with Missing Attributes: In this section, we uses the technique of replacing missing values with mean, median or mode. We uses this approach because it is better approach when the dataset is small and it can prevent data loss. II. Removing Duplicates: We used WEKA tools for removing duplicates from the datasets. We used Remove Duplicates () function in WEKA for removing duplicates. III. Best Attributes Selection: We used Gain Ratio Attribute Eval which evaluates the worth of an attribute by measuring the gain ratio with respect to the class and Ranker which Ranks attributes by their individual evaluations. The top 12 attributes from the whole dataset according to rank from the attributes are presented in Figure 1.Volume XIX Issue II Version I ( ) C Global Journal of Computer Science and Technology© 2019 Global Journals
2iii.Trees RandomForestiv.Rules ZeroOR Classifiera) Trees J48We used Model 1 for training dataset and Model2 for test dataset evaluation.Evaluation of Model 1 Training dataset is given below:Bayes classifierCorrectly Classified Instances42196.7816%Incorrectly Classified Instances143.2184%Kappa statistics0.9324Mean Absolute Error0.0582Root Mean Squared Error0.1706Relative Absolute Error12.2709%Root Relative Squared Error35.0341%Total Number of Instances435
3TP RateFP RatePrecisionRecal-lF measuresMCCRock AreaPRC areaClass0.9660.0300.9810.9660.9740.9330.9750.973democratSensitivity & Specificity Calculation for Training Data (Model 1)Formula of Sensitivity = TP/ (TP+FN)Formula of Specificity = TN/ (TN+FP)So Sensitivity = TP Rate = 0.966 & Specificity = 0.030Evaluation of Model 2 test dataset is given below
4Correctly Classified Instances10596.3303%Incorrectly Classified Instances43.6697%Kappa statistics0.921Mean Absolute Error0.0619Root Mean Squared Error0.1894Relative Absolute Error13.2259%Root Relative Squared Error39.4312%Total Number of Instances109
6Correctly Classified Instances39590.8046%Incorrectly Classified Instances409.1954%Kappa statistics0.8094Mean Absolute Error0.0965Root Mean Squared Error0.2921Relative Absolute Error20.34%Root Relative Squared Error59.9863%Total Number of Instances435
7TP RateFP Rate PrecisionRecallF measuresMCCRock AreaPRC areaClass0.8950.0710.9520.8950.9230.8120.9720.983democratSensitivity & Specificity Calculation for Training Data (Model 3)So Sensitivity = TP Rate = 0.895 & Specificity = 0.071Evaluation of Model 4 test dataset is given below
8Correctly Classified Instances9990.8257%Incorrectly Classified Instances109.1743%Kappa statistics0.8069Mean Absolute Error0.0978Root Mean Squared Error0.2934Relative Absolute Error20.9083%Root Relative Squared Error61.0861%Total Number of Instances109
9TP RateFP RatePrecisionRecallF measuresMCCRock AreaPRC areaClass0.8860.0510.9690.8860.9250.8120.9690.984democratSensitivity & Specificity Calculation for Model 4Sensitivity = TP Rate = 0.886 & Specificity = 0.051c) Trees Random ForestEvaluation on Training Data set: Trees Random Forest algorithm
10Analyzing Political Opinions and Prediction of Voting Patterns in the US Election with DataMining ApproachesYear 2 01939Volume XIX Issue II Version I( ) CGlobal Journal of Computer Science and TechnologyCorrectly Classified Instances42798.1609%Incorrectly Classified Instances81.8391%Kappa statistics0.9613Mean Absolute Error0.0376Root Mean Squared Error0.1222Relative Absolute Error7.9365%Root Relative Squared Error25.0915%Total Number of Instances435© 2019 Global Journals
5TP Rate FP RatePrecisionRecallF measuresMCCRock AreaPRC areaClass0.9810.0180.9890.9810.9850.9610.9980.999democratSensitivity & Specificity Calculation for Training Data (Model 5)So Sensitivity = TP Rate = 0.981& Specificity = 0.018Evaluation of Model 6 test dataset is given below
12Correctly Classified Instances10697.2477%Incorrectly Classified Instances032.7523%Kappa statistics0.9404Mean Absolute Error0.0432Root Mean Squared Error0.1508Relative Absolute Error9.2437%Root Relative Squared Error31.408%Total Number of Instances109
13TP RateFP Rate PrecisionRecallF measuresMCCRock AreaPRC areaClass0.9710.0260.9860.9710.9780.9410.9960.997democratSensitivity & Specificity Calculation for Model 6Sensitivity = TP Rate = 0.971 & Specificity = 0.026d) Rules ZeroOR ClassifierEvaluation on Training Data set: Rules ZeroOR Classifier algorithm
14Correctly Classified Instances26761.3793%Incorrectly Classified Instances16838.6207%Kappa statistics0Mean Absolute Error0.4742Root Mean Squared Error0.4869Relative Absolute Error100%Root Relative Squared Error100%Total Number of Instances435
15TP RateFP RatePrecisionRecallF measuresMCCRock AreaPRC areaClass1.01.00.6141.00.761-0.5000.614democratSensitivity & Specificity Calculation for Training Data (Model 7)So Sensitivity = TP Rate = 1.0 & Specificity = 1.0Evaluation of Model 8 test dataset is given below
16Analyzing Political Opinions and Prediction of Voting Patterns in the US Election with DataMining ApproachesYear 2 01940Volume XIX Issue II Version I)( CGlobal Journal of Computer Science and TechnologyCorrectly Classified Instances7064.2202%Incorrectly Classified Instances3935.7798%Kappa statistics0Mean Absolute Error0.4678Root Mean Squared Error0.4802Relative Absolute Error100%Root Relative Squared Error100%Total Number of Instances109© 2019 Global Journals
11TP RateFP Rate PrecisionRecallF measuresMCCRock AreaPRC areaClass1.01.00.6421.00.782-0.5000.642democratSensitivity & Specificity Calculation for Model 8Sensitivity = TP Rate = 1.0& Specificity = 1.0V. Revaluation of the Best, Second Best and Third Best Model
18
Model Model 1 Model 2Accuracy 96.7816% 96.3303%precision recall 0.981 0.966 0.985 0.957sensitivity specificity 0.966 0.030 0.957 0.026Rank 2 nd best 3 rd bestYear 2 019Model 390.8046%0.9520.8950.8950.07141Model 4 Model 5 Model 6 Model 7 Model 890.8257% 98.1609% 97.2477% 61.3793% 64.2202%0.969 0.989 0.986 0.614 0.6420.886 0.985 0.978 1.00 1.000.886 0.981 0.971 1.00 1.000.051 0.018 0.026 1.00 1.00BestVolume XIX Issue II Version I( ) CGlobal Journal of Computer Science and Technology
			© 2019 Global Journals
			Table 17: Model-8 Precision, Recall, F-measure rate according to Democrat class
		
		
* 
	
		Micro targeting and Electorate Segmentation: Data Mining the American National Election Studies
		
			RGregg
		
		
			AnthonyMurray
		
		
			Scime
		
	
		Journal of political marketing
		
			9
			3
			2010
		
	
* 
	
		Pre-Election Polling: Identifying Likely Voters Using Iterative Expert Data Mining
		
			GregRMurray
		
		
			ChrisRiley
		
		
			AnthonyScime
		
	
		Public opinion Quarterly
		
			73
			1
			2009
		
	
* 
	
		Analysis of Twitter for 2012 South Korea Presidential Election by Text Mining Techniques
		
			Jung-HwanBae
		
		
			Ji-EunSong
		
		
			Min
		
	
		Journal of intelligence and Information Systems
		
			19
			3
			2013
		
	
* 
	
		Mining Twitter
		
			TariqMahmood
		
		
			FarnazTasmiyahiqbal
		
		
			Amin
		
		
			AtikaWaheedalohanna
		
		
			Mustafa