# Introduction nternet is a global public network. In today's world, with the rapid increase in the potentials of the Internet, business model adopted in the organizations has subsequent change. Every day the people connecting to the Internet are also drastically increased. Today's a very critical business model popularly used is E-Business. With the internet, business organizations are having incredible approach of reaching the end users. But in the internet there will be both harmless and harmful users that may lead to lots of risk to the business organizations. The information availability to the end users is one of the main services adopted by every organization. At the same time the information becomes available to the malicious users also. Malicious users or hackers will use different techniques on organization's internal systems to exploit vulnerabilities and compromise the system to access the sensitive information available in the system [1]. Every organization needs to adopt a security measure to overcome the accessing of data from the hackers. Many organizations across the world deployed firewalls to protect their private network from the Public network. Firewall protects the internal system by controlling the incoming and outgoing network traffic based on rule set. As the business organizations needs some kind of access permissions to the internal systems for the Internet users. These permissions may cause some vulnerabilities in the Private network through which the malicious users will have a change to get in to the system. So, the firewalls will not provide the 100% guarantee of the organization in securing the sensitive data present in the system. One of the remedy to defence against the attacks in the network is intrusion detection system (IDS) [2]. An intrusion detection system (IDS) is used to monitor suspicious activities in the network traffic and alerts the system or network administrator. In some cases the IDS is not only used to detect the anomalous or maliceousstraffic but also for taking action such as blocking the user or source IP address from accessing the network. Initially, Intrusion Detection Systems [3,4] were implemented to run on individual hosts or network devices to monitor the inbound and outbound packets from the device and alert the user or administrator about suspicious activity. This sort of detection is called host based (HIDS) intrusion detection systems. But the gradual evolution of the network led to focus on network based (NIDS) intrusion detection systems which is used to monitor traffic to and from all devices in the network by scanning all inbound and outbound traffic that would affect the overall speed of the network. Depending upon the type of analysis used to detect the anomalies, IDS are classified as Signature based and Anomaly based detection systems [5]. Signature based detection system also called misuse detection will monitor the network packets and check the availability of signatures in the database. If the pattern matches it specifies as attack. It is similar to the most antivirus software. The main limitation is it will only detect the attack whose attack patterns are already present in the database i.e., known malicious threats. It is unable to predict the new attacks. But the other type of analysis technique so called Anomaly based detection system will analyse the behaviour of the network and establish the baseline. If the activities in the network deviate from the baseline it will consider as malicious threat. The benchmark dataset usually adopted by the research community of intrusion detection is KDD99 [6]. Each record in the dataset is labelled as normal or attack. Each record in the dataset will consist of 41 features. The features are categorized into four clusters. # Feature selection techniques Feature selection also called attribute selection or variable subset selection. It is used to select the subset of relevant features needed for the model. The data set used in the constructed model will consists of relevant, redundant or irrelevant features [7]. So, the key assumption used in the feature selection technique is removing the data which are redundant or irrelevant. The attribute or feature which does not provide any more information than the currently selected features then such type of features are called as Redundant and if the feature does not consist of useful information in any context then they are called as irrelevant features. Feature selection is also useful as part of the data analysis process, as it shows which features are important for prediction, and how these features are related [8,9]. A feature selection technique provides the following benefits for analytical models: ? Improves the performance of the system. ? Increases the accuracy of prediction ? Need short time for training through which overall time of execution can be reduced. The performance of the system will depend on detection rate and the false alarm rate also called as false positive rate. The detection rate is defined as the number of malicious packets detected by the system (True Positive) divided by the total number of malicious packets present in the data set. False Alarm Rate is defined as the number of normal packets detected as malicious packets (False Positive) divided by the total number of normal packets. Normally the IDS need to have high detection rate and low false alarm rate. To retrieve have high detection rate and low false alarm rate training the system plays a vital role. To train and improve the performance of the system all the parameters of the packet is not needed. So, an appropriate feature selection technique has to be used to select the relevant features by removing the redundant and irrelevant features through which overall performance of the system can be increased by decreasing the training time and increasing the accuracy of detecting the attacks in the network [10,11]. # a) Correlation-based feature reduction (CFS) The Correlation Feature Selection (CFS) [12,13] is a simple filter algorithm for evaluating and ranks subset of features based on correlation evaluation function. By observing the ranks for the attributes we can predict the correlation of the features. The features with high correlation will be considered as relevant features and low correlation can be ignored as Irrelevant features. The following equation gives the correlation of features consisting k features: Information gain [14,15] determines the importance of the attribute in the total training dataset by analysing the information content of attributes. It is also used to predict the ordering of the nodes in the decision tree where nodes are considered as attributes. The highest information gain attribute is chosen as the splitting attribute for node N. This attribute minimizes the information needed to classify the list of attributes in the resulting partitions. By this approach, the needed expected number of tests can be minimized to classify a given list of attributes and guarantees that a simple tree is found. The information gain of the each attribute is calculated as follows: Gain (A) = Info (D) -Info A (D) Where, A?Attribute Info (D) ? Information content of the total dataset Info A (D) ?Information content of the Attribute A Information content of the total dataset is calculated as ????????(??) = ? ? P ?? log 2 (P ?? ) m i=1 Where, D ? Total dataset i? Total number of class labels in the data set P i ? Probability of class label i in the data set Information content of the Attribute A in the total dataset is calculated as ???????? ?? (??) = ? |?? ?? | |??| × ????????(?? ?? ) ?? ?? =1 Where # c) Gain ratio (GR) Gain ratio [16,17] is also a method which is used to define the importance of the attributes. It is a modified version of the information gain that reduces its bias on high-branch attributes. The values of the Gain ratio will be Large when data is evenly spread and it is small when all data belongs to one branch. It will take Gain ratio takes into account the number and size of branches when choosing an attribute. It has modified the information gain by taking into account the essential information of a split. It is based on how much information is needed to tell which branch an instance belongs to. Gain ratio is calculated as follows Gain Ratio (A) = Gain (A) / SplitInfo (A) Where, Gain (A) ?The information gain of the attribute A Split Info (A) ?The splitting information of the attribute A The splitting information is calculated as follows ?????????????????? ?? (??) = ? ? |?? ?? | |??| × log 2 ( |?? ?? | |??| ) ?? ?? =1 Where Principal components analysis (PCA) [18] also known as the Karhunen-Loeve or K-L method is a useful statistical technique which is used to reduce the number of attributes or dimensions in the dataset without much loss in the information needed to analyse the data. The basic procedure is as follows [19,20]: 1. Select the dataset for which the attributes or dimensions has to be reduced. 2. The dataset is normalized such that each attribute falls within the same range. # Initially calculate the covariance between one attribute with the other and derive the covariance matrix. Covariance is calculated as After calculating the eigenvectors of the covariance matrix, then order the eigen values by highest to lowest. These values give the importance of the attributes. The attributes with lesser eigen values can be ignored and higher eigen values will be considered. The attributes after leaving out the lesser Eigen values is considered as feature vector. 6. Derive the final data set Final dataset = Row feature vector * Row data adjust Where, Row feature vector ? The transposed eigenvectors matrix with most important features at the top. ????(??, ??) = ? (?? ?? ? ?? ? )(?? ?? ? ?? ? ) ?? ??=1 ?? ? Row data adjust ? The transposed mean-adjusted matrix (Attribute values in each column, with each row hol ding a separate dimension). # e) Gini index (GI) The Gini index [21] is used to extract the attributes mainly needed to analyse the data set to detect the attacks. It measures the impurity of data set D. The attribute with highest gini index is treated as the unimportant attributes and the lowest gini index is treated as important attributes to detect the attacks. Gini index for the attribute A is calculated as Gini (A) = Gini (D) ? GiniA (D) Where, Gini (D) ? impurity of the total dataset Gini A (D) ? impurity of the Attribute A Impurity of the total dataset is calculated as OLSP-QPSO [22] is an optimizing technique used to replace the QPSO. This technique is used to calculate the best swarm particles by applying a quadratic polynomial model. This process is an iterative process until the best swarm particles are been identified to analyse the attacks. The procedure for optimized QPSO algorithm is as follows 1. Swarm is initialized. 2. mbest is calculated 3. Update the position of the attributes 4. Estimate the fitness value for each attribute 5. If the present fitness value is better than the best fitness value in past, then update the existing fitness value by the current fitness value. 6. Update global best 7. Find the new attribute 8. If the new attribute is better than the worst attribute in the swarm, then replace the worst attribute by the new attribute 9. Repeat step 2 until maximum iterations is reached. Gini (D) = 1 ? ? P ?? m i=12 # III. Comparison between the different feature selection techniques Feature selection plays a major role for achieving the high performance intrusion detection system. Many feature selection techniques were proposed to select the relevant attributes from the data set. Some of the feature selection techniques mainly used was discussed in the previous section. The standard data set mainly used to experiment the intrusion detection system is KDD cup 1999. The KDD cup 1999 [23] consists of approximately 5 million training set records and 3 million test set records. The records are classified as normal or anomaly. The anomalies are broadly classified as four categories such as DoS, U2R, R2L and Probe. Only 19.86 % of the total training records are normal traffic and remaining are the attack traffic. Among the test set, 19.45 % is normal traffic and remaining is attack traffic. Each record in the data set will consists of 41 features. All the attributes in the data set is not needed to analyse the attacks in the network. So, appropriate technique has to be chosen to reduce the features for the data set. Selected feature reduction should not affect the performance of the system. The selected technique should increase the detection rate and decrease the false positives [24]. From the above table it is observed that among the specified feature selection techniques more number of attributes is reduced using the Optimized Least Significant Particle based Quantitative Particle Swarm Optimization (OLSP-QPSO). The performance of the system will depend on detection rate and the false alarm rate [25]. The detection rate is defined as the number of malicious packets detected by the system (True Positive) divided by the total number of malicious packets present in the data set. False Alarm Rate also called as false positive rate is defined as the number of normal packets detected as malicious packets (False Positive) divided by the total number of normal packets. Normally the IDS need to have high detection rate and low false alarm rate. This can be done by selecting the appropriate features needed to detect the attacks. The general formulae used for detection rate and false alarm rate is calculated as follows # Conclusion and future work This paper mainly focuses on the different feature selection techniques used to detect the attacks in the network. Feature selection techniques will decreased the training time of the network. By training the system by the appropriate feature selection technique will increases the performance of the system. ![Where r zc = Correlation between the features. k = Number of features. r zi = Average of the correlations between all features.r ii = Average inter-correlation between features. gain (IR)](image-2.png "") 1Feature selectionNumber of attributesmethodsselectedCorrelation-based feature reduction (CFS)10Gain ratio (GR)14Information gain (IR)20Principal analysis (PCA)component12Gini Index (GI)18Optimized Least SignificantParticle based Quantitative Particle8Optimization (OLSP-QPSO) 2Statistical results Feature selection methodsNumber of attributes selectedDetection rateCorrelation-based feature reduction (CFS)1097.78%Gain ratio (GR)1496.56%Information gain (IR)2096.30%Principal component analysis (PCA)1297.20%Gini Index (GI)1896.42%Optimized LeastSignificant Particlebased Quantitative Particle Swarm898.33%Optimization (OLSP-QPSO) 3Feature selection methods Attack categoriesCorrelation-based feature reduction (CFS)Gain ratio (GR)Information gain (IR)Principal component analysis (PCA)Gini Index (GI)Optimized Least Significant Particle based Quantitative Particle Swarm Optimization (OLSP-QPSO)DoS0.0030.0040.0020.0010.0020.002R2L0.0020.0040.010.0030.0080.001U2R0.0010.0050.0060.0020.0040.003Probe0.0150.0360.0280.0130.0240.01Figure 3 : False positive rate for attack categories V. © 2014 Global Journals Inc. (US) * A Database of Computer Attacks for the Evaluation of Intrusion Detection Systems KKendall 1998 Massachusetts Institute of Technology Master's Thesis * A Review of Anomaly based Intrusion Detection Systems VJyothsna VVRama Prasad International Journal of Computer Applications 28 7 August 2011 * A Revised taxonomy for intrusion detection systems HDebar MDacier AWespi Annales des Telecommunications 55 7-8 2000 * A Comparative Study of Techniques for Intrusion Detection SMukkamala Sung Proceedings of 15th IEEE International Conference on Tools with Artificial Intelligence 15th IEEE International Conference on Tools with Artificial Intelligence IEEE Computer Society Press 2003 * Anomaly-based network intrusion detection: Techniques, systems and challenges PGarcía-Teodoro JDíaz-Verdejo GMaciá-Fernández EVázquez March 2009 Elsevier Computers & Security 28 * Adaptive Intrusion Detection Based on Machine Learning: Feature Extraction, Classifier Construction and Sequential Pattern Prediction XinXu International Journal of Web Services Practices 2 1 2006 * An Introduction to Variable and Feature Selection IsabelleGuyon AndreElisseeff Journal of Machine Learning Research March 2003 * The Feature Selection and Intrusion Detection Problems HSung SMukkamala Proceedings of the 9th Asian Computing Science Conference Lecture Notes in Computer Science the 9th Asian Computing Science Conference Springer 2004 * Features selection for intrusion detection systems based on support vector machines SZaman Karray CCNC'09 Proceedings of the 6th IEEE Conference on Consumer Communications and Networking Conference 2009 * Feature deduction and ensemble design of intrusion detection systems SChebrolu JPAbraham Thomas Computers & Security 24 4 June 2005 * Network Intrusion Detection Design Using Feature Selection of Soft Computing Paradigms TSChou KKYen JLuo International Journal of Computational Intelligence 2008 * Improving Effectiveness of Intrusion Detection by Correlation Feature Selection HNguyen Franke Petrovic International Conference on Availability, Reliability and Security 2010 * Correlation-based Feature Selection for Machine Learning MarkAHall Dept of Computer Science, University of Waikato * Using Information Gain Attribute Evaluation to Classify Sonar Targets JasminaNovakovic November 24-26, 2009 Belgrade 17th Telecommunications forum TELFOR 2009 Serbia * Feature Selection based on Information Gain BAzhagusundari AntonySelvadossThanamani International Journal of Innovative Technology and Exploring Engineering (IJITEE) 2278-3075 2 January 2013 * Bayesian classification model for Real time intrusion detection RPuttini Z LMe Proc. of 22nd. International workshop on Bayesian inference and maximum entropy methods in science and engineering of 22nd. International workshop on Bayesian inference and maximum entropy methods in science and engineering 2002 * Basesian based intrusion detection system HeshamAltwaijry SaeedAlgarny Journal of King Saud University -Computer and Information Sciences 2012 * Adaptive network intrusion detection method based on PCA and support vector machines XinXu XNWang ADMA 2005 Lecture Notes in Artificial Intelligence 2005 3584 LNAI * A tutorial on Principal Components Analysis ILindsay Smith February 26, 2002 * Feature Subset Selection for Network Intrusion Detection Mechanism Using Genetic Eigen Vectors Ahmad AAbdulah KS Alghamdi MAlnfajan Hussain Proc .of CSIT .of CSIT 2011 5 * JiaweiHan MichelineKamber Data mining: Concepts and Techniques Morgan Kauffmann Publishers 2006 * HFO-ANID: Hierarchical Feature Optimization for Anomaly based Network Intrusion Detection VJyothsna VVRama Prasad Third International Conference on Computing Communication & Networking Technologies (ICCCNT) July 2012 Published in IEEE Xplore digital library * A detailed analysis of the KDD Cup datasets MTavallaee EBagheri LuWGhorbani AA proceedings of IEEE Symposium on computational intelligence in security and defence applications IEEE Symposium on computational intelligence in security and defence applications 2009 * Identify Features and Parameters to Devise an Accurate Intrusion Detection System Using Artificial Neural Network MSaman Abdulla BNajla OmarAl-Dabagh Zakaria World Academy of Science, Engineering and Technology 2010 * Intrusion Detection using Naive Bayes Classifier with Feature Reduction Dr NeelamSaurabh Mukherjeea Sharmaa 2012 Elsevier Procedia Technology * NSL-KDD dataset for network -based intrusion detection systems