# Introduction

n recent years, internet has been created several platforms for making human life become more secure. Among these; e-mail is a substantial platform for user communication. Email is nothing; simply it's called an electronic messaging framework which transmits the message from one user to another [1]. Nowadays, e-mail has turned into a typical medium [2] because of its several branches like Yahoo mail [3], Gmail [4], Outlook [5] etc, which are completely free for all web user by following some administration [6,7]. At present, Email called a secure worldwide communication medium for its several functions. But sometimes email becomes more hazardous for some "Spam Email".

Generally, Spam email called as junk email or unsolicited message which sent by spammer through Email. The process is, collected the address on the web and sends the message through domain's username. Actually, it has been produced for financial profits using I the assortment of procedures [8] and instruments that incorporate spoofing, bonnets, open intermediaries, mail transfers, bulk mail instruments called mailers, and so forth. Spam filtering is a challenging undertaking for an assortment of reasons. For spam email, users are facing several problems like abuse of traffic, limit the storage space, computational power, become a barrier for finding the additional email, waste users time and also threat for user security [9,10]. So, becoming email more secure and effective, appropriate Email filtering is essential.

Several types of researches have been performed on email filtering, some acquired good accuracy and some are still going on. According to researcher's overview, Email filtering is a process to sort email according to some criteria. As there are various methods exist for email filtering, among them, inbound and outbound filtering is well known. Inbound filtering is the process to read a message from internet address and outbound filtering is to read the message from the local user. Moreover, the most effective and useful email filtering is Spam filtering which performs through antispam technique. As spammers are proactive natures and using dynamic spam structures which have been changing continuously for preventing the anti-spam procedures and thus making spam filtering is a challenging task [9,10].

Spam filtering is a process to detect unsolicited massage and prevent from entering into user's inbox. Now days, various systems have been existed to generate anti-spam technique for preventing unsolicited bulk email. Most of the anti-spam methods have some inconsistency between false negatives (missed spam) and false positives (rejecting good emails) which act as a barrier for most of the system to make successful antispam system. Therefore, an intelligent and effective spam-filtering system is the prime demand for web users.

Among various approach, Fiaidhi et al. [11] and Arora et al. [12] proposed method evaluate that, 70% today's business email's are spam [13]. Spam filtering has two major section; "Knowledge engineering" and "Machine learning". Knowledge engineering is an arrangement of guidelines to determine the spam a) Standard Spam Filtering Method Email Spam filtering process works through a set of protocols to determine either the message is spam or not. At present, a large number of spam filtering process have existed. Among them, Standard spam filtering process follows some rules and acts as a classifier with sets of protocols. Figure .1 shows that, a standard spam filtering process performed the analysis by following some steps [14]. First one is content filters which determine the spam message by applying several Machines learning techniques [8,10,[15][16][17][18]. Second, header filters act by extracting information from email header. Then, backlist filters determine the spam message and stop all emails which come from backlist file. Afterward, "Rules-based filters" recognize sender through subject line by using user defined criteria [19]. Next, "Permission filters" send the message by getting recipients pre-approvement. Finally, "Challengeresponse filter" performed by applying an algorithm for getting the permission from the sender to send the mail.


# Global Journal of Computer Science and Technology

Volume XVIII Issue II Version I 


# II. Several Email Spam Filtering Methods

At present, number of spam email has increased for several criteria such as an advertisement, multi-level marketing, chain letter, political email, stock market advice and so forth. For restricting spam email, several methods or spam filtering system has been constructed by using various concept and algorithms. This section concluded by describing few of spam filtering methods to understand the process of spam filtering and its effectiveness. Enterprise level spam filtering is a process where provided frameworks are installing on mail server which interacts with the MTA for classifying the received messages or mail in order to categorize the spam message on the network. By this system, a user on that network can filter the spam by installing appropriate system [21,22] more efficiently. By far most; current spam filtering frameworks use principle based scoring procedures. An arrangement of guidelines is connected to a message and calculate a score based principles that are valid for the message. The message will consider as spam message when it exceeds the threshold value. As spammers are using various strategies, so all functions are redesigned routinely by applying a list-based technique to automatically block the messages. Figure 2 represents the method of client side and enterprise level spam filtering [7].  At the first step, extracted all email (spam email and legitimate email) from individual users email through collection model. Then, the initial transformation starts with the pre-processing steps through client interface, highlight extraction and choice, email data classification, analyzing the process and by using vector expression classifies the data into two sets.

Finally, machine learning technique is applied on training sets and testing sets to determine email whether it is spam or legitimate. The final decision makes through two steps; through self observation and classifier's result to make decision whether the email is spam or legitimate.


# III. Overview of Several Existing Email Spam Filtering Systems for Machine Learning Technique

Mohammed et al. [2] [2013] proposed an approach for Classifying Unsolicited Bulk Email (UBE) using Python Machine Learning Techniques with the help of spam filtering which performs the work by creating a spam-ham dictionary from the given training data and applying data mining algorithm to filter the training and testing data.

After applying various classifier on1431 dataset, the approach predicts that, Naïve Bays and SVM classifiers are the prominent classifier for spam filtering or classification.

Subramaniam et al. [23] [2012] implemented Naïve Bayesian Anti-spam Filtering Technique on Malay Language to investigate the utilization of Naïve Bayesian procedure to combat spam issue. An experiment conducted through Naïve Bayesian method for filtering Malay language spam and the result depicts that, propose approach has gained 69% accuracy. They realized that by reducing false positive and expanding training corpus the result would much better for classifying Malay language spam. Banday et al. [25] [2008] discuss the procedures of statistical spam filters design by incorporating Naïve Bayes, KNN, SVM, and Bayes Additive Regression Tree. Here evaluates these procedures in terms of accuracy, recall, precision, etc. Though all machine learning classifiers are effective but according to this approach, CBART and NB classifiers has better capability to spam filtering. This approach estimates that during spam filtering calculations of false positive are more costly than false negative.

Awad et al. [1] [2011] proposed an ML-based approach on for Spam E-mail Classification. In this article present the most prominent machine learning strategies and its effectiveness regarding spam email classification. Here introduced Portrayals algorithms and the performance of Spam Assassin corpus. The result shows that, Naïve bays and rough sets methods are the promising algorithms for email classification. They perform their future research to improve the Nave Bays and Artificial immune system by hybrid system or by resolution the feature reliance issue .

Chhabra et al. [26] [2010] developed Spam Filtering using Support Vector Machine by considering Nonlinear SVM classifier with different kernel functions over Enron Dataset. Here considered six datasets and perform the analysis of datasets having diverse spam: ham ratio and makes satisfactory Recall and Precision Value.

Tretyakov et al. [27] [2004] discussed Machine Learning Techniques through Spam Filtering. In this article compared the precision between before eliminating false positive and after eliminating false positive. They represent the result that the result becomes more reliable considering both precision results (before eliminating and after eliminating false positive) either taking one.

Shahi et al. [28] [2013] developed Mobile SMS Spam Filtering for Nepali Text Using Naïve Bayesian and Support Vector Machine. The fundamental concern of this study was to look at the effectiveness of Naïve Bayesian and SVM Spam filters. The correlation of productivity between these Spam filters was done based Suganya et al. [30] [2014] worked on short message and misspelling of data on online Social Networks (OSNs) user post. They used machine learning technique with content-based features for short message and Filtered Wall (FW) [31] to evaluate a system for filtering spam massage. They categorized the classification process into two levels; first-level classifier performs on Neutral and Non-neutral through hard binary categorization and second level classifier performs through RBFN model [32].

Rathi et al. [33] [2013] proposed an approach using Data mining technique for finding the best classifier for email classification. They analyzed various data mining technique for measuring the performance of several classifiers through "with feature selection algorithm" and "without feature selection algorithm". After selecting the Best feature selection algorithm, they considered the selected algorithm for their feature selection purpose. They experiment their data by using several algorithms such as Naïve Bayes, Bayes Net, Support vector machine, and Function tree, J48, Random Forest and Random Tree. The whole dataset consists of 58 attributes and 4601 instances. Considering Random Tree algorithm highest accuracy was 99.72% and the lowest accuracy was 78.94% for Naïve Bayes algorithm.

Mohammed et al. [11] [2013] presents an approach for filtering spam email using machine learning algorithms. At first, they filter Spam and Ham word from the training datasets by applying tokenization method based on these token create the testing and training table using various data mining algorithm. Then find the frequency of spam and ham tokens for measuring the probability which is suggested by Paul Graham [34]. For ham token, the probability value was 0 and for spam token probability value was 1. They used Nielson Email-1431 [35] dataset and emphasized that the Naïve Bayes and Support Vector Machine are the most effective classifier.

Singh et al. [36] [2018] discussed the solution and classification process of spam filtering and presented a combining classification technique to get better spam filtering result. With the help of Data mining, they collected all the information of previous failures, success and current problems of spam filtering. In this method, researchers used binary value where 1 for spam email and 0 for not spam emails. But its success rate was very poor. So they apply NB, KNN, SVM, Artificial Neural Network classification method and find their accuracy. Based on these two techniques (machine learning and knowledge engineering) effectiveness, they adopt a classification technique for spam filtering. Moreover, here first collect data from user training set, compared and find the spam email and then use a global training set to optimize the classification technique. Using this technique increases the precision rate at least 2%.

Abdulhamid et al. [37] [2018] introduced a performance analysis based approach by using some classification techniques such as Bayesian Logistic Regression, Hidden Naïve Bayes, Logit Boost, Rotation Forest, NNge, Logistic Model Tree, REP Tree, Naïve Bayes, Radial Basis Function (RBF) Network, Voted Perceptron, Lazy Bayesian Rule, Multilayer Perceptron, Random Tree and J48. The competence of these techniques classified through Accuracy, Precision, Recall, F-Measure, Root Mean Squared Error, Receiver Operator Characteristics Area and Root Relative Squared Error using Spam base dataset and WEKA data mining tool. For conducting the performance and comparison, datasets are considered from UCI Machine Learning Repository. Considering Rotation Forest algorithm acquired the highest accuracy was 0.942 and the REP Tree algorithm showed the lowest accuracy was 0.891. They applied the F-measure method for finding precision and recall. The highest F-measure considered from Rotation forest algorithm and lowest Fmeasure considered from Naïve Bayes algorithm. For finding the probability use ROC curves on randomly selected positive and negative instance and for Rotation forest algorithm the ROC curves carried the highest score was 0.98. In contrast, Random Tree having the lowest score which was 0.905. For finding the statistics result, they use kappa Statistics and the result was much better for Rotation Forest algorithm which approximately 0.879. This paper showed that, Rotation Forest classifier gained the best result with 0.942 accuracies, then J48 with 0.923, Naïve Bayes with 0.885 and Multilayer Perception with 0.932.

Sah et al. [38] [2017] proposed a method for detecting of malicious spam through feature selection and improve the training time and accuracy of malicious spam detection system. They also showed the comparison of difference classifier as Naïve Bayes (NB) and Support Vector Machine (SVM) based on accuracy and computation time. to the approach, Naïve Bayes selected as good classifiers among others.

Rusland et al. [40] [2017] perform the analysis using Naïve Bayes algorithm for email spam filtering on two datasets which are evaluated based on the accuracy, recall, precision and F-measure. Naïve Bayes algorithm is a probability-based classifier and the probability is counting the frequency and combination of values in a dataset. This research performed through three phases such as pre-processing, Feature Selection, and implementation through Naïve Bayes Classifier. First they remove all conjunction words, articles from the email body in pre-processing section. Made two datasets through WEKA tool; one is a Spam Data and another is the SpamBase dataset. The average accuracy was 8.59% by considering two datasets where Spam data get 91.13% and the SpamBase data get 82.54% accuracy. The average precision for SpamBase was 88% and for Spam data was 83%. They proposed that, Naïve Bayes classifier performs better on SpamBase data compared with Spam Data.

Yuksel et al. [41] [2017] use Support Vector Machine and Decision tree for spam filtering. The Decision tree used in data mining and the support vector machines as a supervised learning model which can analyze the data for spam classification. First data was divided into two sections; one is training and other is test data, then the algorithm was trained and evaluated through Microsoft Azure platform which provides tools for machine learning and compared results with decision tree and support vector machine algorithm. The result of SVM method was 97.6% and for Decision tree the result was 82.6%. The result estimate that, SVM classifier performed better than DT.

Choudhary et al. [42] [2017] presented a novel approach using machine learning classification algorithm for finding and classifying SMS spam by using Short Message Service (SMS). The first step in this approach is feature selection and for that, they work on presence of mathematical symbols: UGLs, Dots, special symbols, emotions, Lowercased words and Uppercased words, mobile number, keyword specific and the message length in the SMS. After that they created a system design and collected a dataset which contained 2608 emails out of 2408 collected SNS Spam Corpus. The SMS Spam Corpus v.0.1 consists two sets of messages as SMS Spam Corpus v.0.1 Small and SMS Spam Corpus v.0.1 Big. Using "WEKA tools" for five machine learning approaches; such as Naive Bayes, Logistic Regression, J48, Decision Tree and Random Forest. Evaluating result uses with True Positive Rate (TP) and True Negative Rate (TN). False Positive Rate (FP), False Negative Rate (FN), Precision, Recall, Fmeasure and Receiver Operating Characteristics (ROC) area achieved 96.5% true positive rate and 1.02% false positive rate with Random Forest machine learning algorithm and it performs better algorithm with high rate accuracy.

DeBarr et al. [43] [2009] use Random Forest algorithms for classification of spam email then refining the classification model using active learning. They take data from RFC 822(Internet) email message and divided each email into two sections and converted each message to term frequency and inverse document frequency (TF/IDF) features. Here select an initial set of email message using clustering technique to label as training examples and for clustering used Partitioning Around Medoids (PAM) algorithm. After considering the cluster prototype messages for training they experiment with some algorithm Random Forest, Naive Bayes, SVM and kNN. Here Random Forest algorithm performs the best classifier with 95.2% accuracy.


# IV. Summary of Existing E-mail Spam Classification Approaches

Since last few decades, researchers are trying to make email as a secure medium. Spam filtering is one of the core features to secure email platform. Regarding this several types of research have been progressed reportedly but still there are some untapped potentials. Over time, still now e-mail spam classification is one of the major areas of research to bridge the gaps. Therefore, a large number of researches already have been performed on email spam classification using several techniques to make email more efficient to the users. That's why, this paper tried to arrange the summarized version of various existing Machine Learning approaches. In addition, in order to evaluates the most of the approaches like Random Forest, Naive Bayes [11,23,43], SVM [8,10,18], kNN [27,36], and Random Forest [15,16] used reliable and well known dataset for benchmarking performance such as SpamData [16], The Spam Assassin [44], The Spambase, Ecml-pkdd 2006 challenge dataset [45], PU corpora dataset [15], Enron dataset [46],Trec 2005 dataset [47]. Some of these dataset are in a prepared structure e.g. ECML and data accessible in Spambase UCI archive [20]. Among them, some of the classifiers also used novel methods applied in the feature selection for improving classification such as [1,11].

Verma et al. [39] [2017] proposed a method for spam detection using Support Vector Machine algorithm and feature extraction. This methodology works through several steps such as Email collections, preprocessing, feature extraction, SVM training, test classifier, top word predictors, test email and result. First they take a dataset from Apache Public corpus. In preprocessing section, they remove all special symbol, URL and HTML tags and also unnecessary alphabet. Then they mapped all word from the dictionary using Vocab file. SVM classifier applied on the training dataset. The Accuracy of the system was 98%. 


# Discussion

From the observation, it seems that, the majority of email spam filtering process performed through Machine learning technique using Naïve Bayes and SVM algorithm. Most of the approaches adopt different dataset such as "ECML" data and Spam base UCI archive [20]. Among several papers, Mohammad et al. introduce a classifier for feature selection which regarded as the most novel classifier for feature selection [1,11]. Rathi et al proposed an approach considering "Naïve Bayes", "Bayes Net", "SVM" and "Random forest" algorithm and obtain the higher accuracy than others which approximately crossed 99.72% accuracy [32]. Another one is, Awad et al. which proposed an approach considering "Naïve Bayes", "SVM", "K-Nearest Neighbor", "Artificial neural Networks", "Rough sets" algorithm and obtain 99.46% accuracy which seems good on their effectiveness [1]. After the analysis it should predict that, "Naïve Bayes" and "SVM" algorithm is the most effective algorithm in machine learning technique and have the ability to better classification of email spam.


# VI.


# Conclusion

This survey paper elaborates different Existing Spam Filtering system through Machine learning techniques by exploring several methods, concluding the overview of several Spam Filtering techniques and summarizing the accuracy of different proposed approach regarding several parameters. Moreover, all the existing methods are effective for email spam filtering. Some have effective outcome and some are trying to implement another process for increasing their accuracy rate. Though all are effective but still now spam filtering system have some lacking which are the major concern for researchers and they are trying to generate next generation spam filtering process which have the ability to consider large number of multimedia data and filter the spam email more prominently.
1![Figure 1 : A standard process of Email spam filtering system b) Client Side and Enterprise Level Spam Filtering Methods A client can send or receive an email by just one clicking through an ISP. Client level spam filtering provides some frameworks for the individual client to secure mail transmission. A client can easily filter spam through these several existing frameworks by installing on PC. This framework can interact with MUA (Mail user agent) and filtering the client inbox by composing, accepting and managing the messages [2].Enterprise level spam filtering is a process where provided frameworks are installing on mail server which interacts with the MTA for classifying the received messages or mail in order to categorize the spam message on the network. By this system, a user on that network can filter the spam by installing appropriate system[21,22] more efficiently. By far most; current spam filtering frameworks use principle based scoring procedures. An arrangement of guidelines is connected to a message and calculate a score based principles that are valid for the message. The message will](image-2.png "Figure 1 :")
2![Figure 2: Client Side and Enterprise level Email spam filtering system](image-3.png "Figure 2 :A")
3![Figure 3: Case Base Spam Filtering SystemHere, describes a sample of case base spam filtering architecture by applying Machine learning techniques [Fig.3] in detail. The full process perform through several steps which followed by the figure3.At the first step, extracted all email (spam email and legitimate email) from individual users email through collection model. Then, the initial transformation starts with the pre-processing steps through client interface, highlight extraction and choice, email data classification, analyzing the process and by using vector expression classifies the data into two sets.Finally, machine learning technique is applied on training sets and testing sets to determine email whether it is spam or legitimate. The final decision makes through two steps; through self observation and classifier's result to make decision whether the email is spam or legitimate.](image-4.png "Figure 3 :")

( ) C© 2018 Global Journals 1
IA Survey of Existing E-Mail Spam Filtering Methods Considering Machine Learning TechniquesV.Sr. No.AuthorAlgorithmsCorpus or DatasetsAccuracy/ Performance1Mohammed et al.Naive KNN,Decision Tree, Rules Bayes, SVM,Email-143185.96% Accuracy Achieved2Subramaniam et al.Naive BayesianCollection emails from Google's of spam Gmail Account96.00% Accuracy Achieved3Sharma et al.Various Machine Learning Algorithms AdaptionsSPAMBASE94.28% Accuracy AchievedYear 20184Banday et al.Naive Bayes, K-Nearest Neighbor, SVM, classification Bayes Additive Regression TreeReal life data set96.69% Accuracy Achieved26 Volume XVIII Issue II Version I C ( )5 6 7 8 9 10Awad et al. Chhabra et al. Tretyakov Shahi et al. Kaul et al Suganya et al.Naive Bayes, SVM, k-Nearest Neighbor, Artificial Neural Networks, Rough Sets Nonlinear SVM classifier. Bayesian classification, k-NN, ANNs, SVMs Naïve Bayes, SVM SVM Rule Baseed Method Naive Bayes, BayesSpam Assassin Enron dataset PU1 corpus Nepali SMS Sample emails Online Social Networks (OSNs) user post99.46% Accuracy For Dataset 3, spam: real, the ratio is 1:3, for satisfactory Recall and Precision Values Achieved 94.4% Accuracy Achieved 92.74% Accuracy Achieved 90% ~ 95%Accuracy Achieved Excellence Accuracy for Given DatasetsGlobal Journal of Computer Science and Technology11 12 13 14 15 16 17 18Rathi et al. Mohammed et al. Singh et al. Abdulhamid et al. Sah et al. Verma et al. Rusland et al. ksel et al.Net,SVM, and Random Forest Word Filterization by Tokenization, Appling Naive Bayes, k-Nearest Neighbor, SVM, Artificial Neural Network. Various Machine Learning Algorithms Naïve Bayes, SVM Customised SVM Modified Naive Bayes withselective features Microsoft Azure platform defined decision tree andCustom Collection Nielson Email-1431 Custom Collection UCI Machine Learning Repository & Custom Collection Apache Public Corpus SpamBase, SpamData Custom Collection99.72% Accuracy Rate Reported Satisfactory Accuracy for Proposed Method Reported Improvement of precision rate at least 2% 94.2% Accuracy Achieved Reported good Accuracy overall 98% Accuracy Rate Reported SpamBase get 88%Precision Rate and SpamData get 83% SVM Accuracy 97.6% Decision TreeSVMAccuracy 82.6%19Choudhary et al.Feature Engineered Naive BayesThe SMS Spam Corpus v.0.196.5% True Positive Rate Accuracy20DeBarr et al.Random Forest algorithmCustom Collection95.2% Accuracy© 2018 Global Journals 1
		
		
* 
	
		Machine Learning methods for E-mail Classification
		
			WAAwad
		
		
			SMElseuofi
		
	
		International Journal of Computer Applications
		
			1
			16
			2011
		
	
* 
	
		A survey of machine learning techniques for Spam filtering
		
			OSaad
		
		
			ADarwish
		
		
			RFaraj
		
	
		International Journal of Computer Science and Network Security (IJCSNS)
		
			12
			2
			66
			2012
		
	
* 
	
		A first look at inter-data center traffic characteristics via yahoo!datasets
		
			YChen
		
		
			SJain
		
		
			VKAdhikari
		
		
			ZLZhang
		
		
			KXu
		
	
		INFOCOM, 2011 Proceedings IEEE
				
			IEEE
			2011. April
			
		
* 
	
		Like technology from an advanced alien culture: Google apps for education at ASU
		
			KBarlow
		
		
			JLane
		
	
		Proceedings of the 35th annual ACM SIGUCCS fall conference
				the 35th annual ACM SIGUCCS fall conference
		
			ACM
			2007
			
		
	October)


* 
	
		Revisiting Whittaker & Sidner's email overload ten years later
		
			DFisher
		
		
			AJBrush
		
		
			EGleave
		
		
			MASmith
		
	
		Proceedings of the 2006 20th anniversary conference on Computer supported cooperative work
				the 2006 20th anniversary conference on Computer supported cooperative work
		
			ACM
			2006. November
			
		
* 
	
		A survey of learningbased techniques of email spam filtering
		
			EBlanzieri
		
		
			ABryl
		
	
		Artificial Intelligence Review
		
			29
			1
			
			2008
		
	
* 
	
		A case-based approach to spam filtering that can track concept drift
		
			PCunningham
		
		
			NNowlan
		
		
			SJDelany
		
		
			MHaahr
		
	
		The ICCBR
				
			2003. May
			3
			
		
* 
	
		Support vector machines for spam categorization
		
			HDrucker
		
		
			DWu
		
		
			VNVapnik
		
	
		IEEE Transactions on Neural networks
		
			10
			5
			
			1999
		
	
* 
	
		A Bayesian approach to filtering junk e-mail
		
			MSahami
		
		
			SDumais
		
		
			DHeckerman
		
		
			EHorvitz
		
	
		Learning for Text Categorization: Papers from the 1998 workshop
				
			1998. July
			62
			
		
* 
	
		SVM-Based Spam Filter with Active and Online Learning
		
			QWang
		
		
			YGuan
		
		
			XWang
		
	
		TREC
				
			2006
		
	
* 
	
		
			SMohammed
		
		
			OMohammed
		
		
			JFiaidhi
		
		
			SJFong
		
		
			THKim
		
		Classifying Unsolicited Bulk Email (UBE) using Python Machine Learning Techniques
				
			2013
		
	
* 
	
		Text and image based spam email classification using KNN, Naïve Bayes and Reverse DBSCAN algorithm
		
			AHarisinghaney
		
		
			ADixit
		
		
			SGupta
		
		
			AArora
		
	
		Optimization, Reliabilty, and Information Technology (ICROIT), 2014 International Conference on
				
			IEEE
			2014. February
			
		
* 
	
		Supervised learning approach for spam classification analysis using data mining tools. organization
		
			MScholar
		
		
			2010
			2
			
		
* 
	
		A study on email spam filtering techniques
		
			VChristina
		
		
			SKarpagavalli
		
		
			GSuganya
		
	
		International Journal of Computer Applications
		
			12
			1
			
			2010
		
	
* 
	
		
			VMetsis
		
		
			IAndroutsopoulos
		
		
			GPaliouras
		
		Spam filtering with naive bayes-which naive bayes? In CEAS
				
			2006. July
			17
			
		
* 
	
		An evaluation of naive bayesian anti-spam filtering
		
			IAndroutsopoulos
		
		
			JKoutsias
		
		
			KVChandrinos
		
		
			GPaliouras
		
		
			CDSpyropoulos
		
		cs/0006013
		
			2000
		
	
	arXiv preprint


* 
	
		Naive Bayes Spam Filtering Using Word-Position-Based Attributes
		
			JHovold
		
	
		CEAS
				
			2005. July
			
		
* 
	
		Evaluating costsensitive unsolicited bulk email categorization
		
			JM GHidalgo
		
	
		References In Proceedings of the 2002 ACM symposium on Applied computing
				
			ACM
			2002. March
			
		
* 
	
		Learning to filter spam e-mail: A comparison of a naive bayesian and a memory-based approach
		
			IAndroutsopoulos
		
		
			GPaliouras
		
		
			VKarkaletsis
		
		
			GSakkis
		
		
			CDSpyropoulos
		
		
			PStamatopoulos
		
		cs/0009009
		
			2000
		
	
	arXiv preprint


* 
	
		In vivo spam filtering: a challenge problem for KDD
		
			TFawcett
		
	
		ACM SIGKDD Explorations Newsletter
		
			5
			2
			
			2003
		
	
* 
	
		Using visual features for anti-spam filtering
		
			CTWu
		
		
			KTCheng
		
		
			QZhu
		
		
			YLWu
		
	
		ICIP 2005. IEEE International Conference on
				
			IEEE
			2005. September. 2005
			3
			509
		
	
	Image Processing


* 
	
		Spam filtering for short messages
		
			GVCormack
		
		
			JMHidalgo
		
		
			EPSánz
		
	
		Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
				the sixteenth ACM conference on Conference on information and knowledge management
		
			ACM
			2007. November
			
		
* 
	
		Overview of textual anti-spam filtering techniques
		
			TSubramaniam
		
		
			HAJalab
		
		
			AYTaqa
		
	
		International Journal of Physical Sciences
		
			5
			12
			
			2010
		
	
* 
	
		Adaptive approach for spam detection
		
			SSharma
		
		
			AArora
		
	
		International Journal of Computer Science Issues
		
			10
			4
			
			2013
		
	
* 
	
		Effectiveness and limitations of statistical spam filters
		
			MTBanday
		
		
			TRJan
		
		arXiv:0910.2540
		
			2009
		
	
	arXiv preprint


* 
	
		Spam filtering using support vector machine
		
			PChhabra
		
		
			RWadhvani
		
		
			SShukla
		
	
		Special Issue IJCCT
		
			1
			2
			3
			2010
		
	
* 
	
		Machine learning techniques in spam filtering
		
			KTretyakov
		
	
		Data Mining Problem-oriented Seminar, MTAT
				
			2004. May
			3
			
		
* 
	
		Mobile SMS spam filtering for Nepali text using naïve bayesian and support vector machine
		
			TBShahi
		
		
			AYadav
		
	
		International Journal of Intelligence Science
		
			4
			01
			24
			2013
		
	
* 
	
		Support vector machines for spam categorization
		
			HDrucker
		
		
			DWu
		
		
			VNVapnik
		
	
		IEEE Transactions on Neural networks
		
			10
			5
			
			1999
		
	
* 
	
		Detection of Spam in Online Social Networks (OSN) Through Rule-based System
		
			TSuganya
		
		
			KSridevi
		
		
			MArulprakash
		
		
* 
	
		
			URahane
		
		
			ALande
		
		
			OBavikar
		
		
			SChavan
		
		
			KNShedge
		
	
		International Journal of Engineering Sciences & Research Technology Advanced Filtering System to Protect OSN user Wall From Unwanted Messages
		
	
* 
	
		Fast learning in networks of locally-tuned processing units
		
			JMoody
		
		
			CJDarken
		
	
		Neural computation
		
			1
			2
			
			1989
		
	
* 
	
		Spam mail detection through data mining-A comparative performance analysis
		
			MRathi
		
		
			VPareek
		
	
		International Journal of Modern Education and Computer Science
		
			5
			12
			31
			2013
		
	
* 
	
		
			PGraham
		
		
		A plan for spam
				
			2002
		
	
* 
	
		Categorization and keyword identification of unlabeled documents
		
			NKang
		
		
			CDomeniconi
		
		
			DBarbará
		
	
		Data Mining, Fifth IEEE International Conference on
				
			IEEE
			2005. November
			4
		
	
* 
	
		Spam Mail Detection Using Classification Techniques and Global Training Set
		
			VKSingh
		
		
			SBhardwaj
		
	
		Intelligent Computing and Information and Communication
				
			2018
			
		
* 
	
		
			MSShafi'i Muhammad Abdulhamid
		
		
			OOsho
		
		
			IIsmaila
		
		
			JKAlhassan
		
		
			2018
		
	
* 
	
		Comparative Analysis of Classification Algorithms for Email Spam Detection
		
	
* 
	
		An approach for Malicious Spam Detection in Email with comparison of different classifiers
		
			UKSah
		
		
			NParmar
		
		
			2017
		
	
* 
	
		E-Mail Spam Detection and Classification Using SVM and Feature Extraction
		
			TVerma
		
		
			2017
		
	
* 
	
		Analysis of Naïve Bayes Algorithm for Email Spam Filtering across Multiple Datasets
		
			NFRusland
		
		
			NWahid
		
		
			SKasim
		
		
			HHafit
		
	
		IOP Conference Series: Materials Science and Engineering
				
			IOP Publishing
			2017. August
			226
			12091
		
	
* 
	
		Design of a Machine Learning Based Predictive Analytics System for Spam Problem
		
			ASYüksel
		
		
			SFCankaya
		
		
			?SÜncü
		
	
		Acta Physica Polonica, A
		
			3
			132
			2017
		
	
* 
	
		Towards Filtering of SMS Spam Messages Using Machine Learning Based Technique
		
			NChoudhary
		
		
			AKJain
		
	
		Advanced Informatics for Computing Research
				Singapore
		
			Springer
			2017
			
		
* 
	
		Spam detection using clustering, random forests, and active learning
		
			DDebarr
		
		
			HWechsler
		
	
		Sixth Conference on Email and Anti-Spam
				Mountain View, California
		
			2009. July
			
		
* 
	
		A review of machine learning approaches to spam filtering
		
			TSGuzella
		
		
			WMCaminhas
		
	
		Expert Systems with Applications
		
			36
			7
			
			2009
		
	
* 
	
		Using tri-training and support vector machines for addressing the ECML/PKDD 2006 discovery challenge
		
			DMavroeidis
		
		
			KChaidos
		
		
			SPirillos
		
		
			DChristopoulos
		
		
			MVazirgiannis
		
	
		Proceedings of ECMLPKDD 2006 Discovery Challenge Workshop
				ECMLPKDD 2006 Discovery Challenge Workshop
		
			2006
			
		
* 
	
		Introducing the Enron Corpus
		
			BKlimt
		
		
			YYang
		
	
		CEAS
				
			2004. July