# Introduction any organizations maintain huge data repositories which store data collected from various sources in different formats. The said data repositories are also known as data warehouses. One of the prominent sources of data is remote sensed data collected via satellites or geographical information systems software's [1]. The data thus collected can be of use in various applications including and not restricted to land use [2] [3], species distribution modeling [4] [5] [6] [7], mineral resource identification [8], traffic analysis [10], network analysis [9] and environmental monitoring systems [11] [12]. Data mining is used to extract information from the said data repositories. The information thus mined can help various stakeholders in an organization in taking strategic decisions. Data can be mined from the data repositories using various methodologies like anomaly detection, supervised classification, clustering, association rule learning, regression, characterization and summarization and sequential pattern mining. In this paper we shall be applying a hybrid classification technique to classify plant seed remote sensed data. A lot of research has been undertaken to classify plant functional groups, fish species, bird species etc... [7][13] [14].The classification of various species shall help in conserving the ecosystem by facilitating ins predicting of endangered species distribution [15]. It can also help in identifying various resources like minerals, water resources and economically useful trees. Various technologies in this regard have been developed. Machine learning methods, image processing algorithms, geographical information systems tools etc..have added to the development of numerous systems that can contribute to the study of spatial data and can mine relevant information which can be of use in various applications. The systems developed can help constructing classification models that in turn facilitate in weather forecasting, crop yield classification, mineral resource identification, soil composition analysis and also locating water bodies near to the agricultural land. Classification is the process wherein a class label is assigned to unlabeled data vectors. It can be categorized into supervised and un-supervised classification which is also known as clustering. In supervised classification learning is done with the help of supervisor ie. learning through example. In this method the set of possible class labels is known apriori to the end user. Supervised classification can be subdivided into non-parametric and parametric classification. Parametric classifier method is dependent on the probability distribution of each class. Non without supervisor ie. learning from observations. In this method set of possible classes is not known to the end user. After classification one can try to assign a name to that class. Examples of un-supervised classification methods are Adaptive resonance theory(ART) 1, ART 2,ART 3, Iterative Self-Organizing Data Analysis Method, K-Means, Bootstrapping Local, Fuzzy C-Means, and Genetic Algorithm [17]. In this paper we shall discuss about a hybrid classification method. The said hybrid method will make use of support vector machine(SVM) classification, random forest and boosting methods. Later its performance is evaluated against traditional individual random forest classifiers and support vector machines. A powerful statistical tool used to perform supervised classification is Support Vector machines. Herein the data vectors are represented in a feature space. Later a geometric hyperplane is constructed in the feature space which divides the space comprising of data vectors into two regions such that the data items get classified under two different class labels corresponding to the two different regions. It helps in solving equally two class and multi class classification problem. The aim of the said hyper plane is to maximize its distance from the adjoining data points in the two regions. Moreover, SVM's do not have an additional overhead of feature extraction since it is part of its own architecture. Latest research have proved that SVM classifiers provide better classification results when one uses spatial data sets as compared to other classification algorithms like Bayesian method, neural networks and k-nearest neighbors classification methods [18] [19]. In Random forest(RF) classification method many classifiers are generated from smaller subsets of the input data and later their individual results are aggregated based on a voting mechanism to generate the desired output of the input data set. This ensemble learning strategy has recently become very popular. Before RF, Boosting and Bagging were the only two ensemble learning methods used. RF can be applied for supervised classification, unsupervised learning and regression. RF has been extensively applied in various areas including modern drug discovery, network intrusion detection, land cover analysis, credit rating analysis, remote sensing and gene microarrays data analysis etc... [20][21]. Other popular ensemble classification methods are bagging and boosting. Herein the complex data set is divided into smaller feature subsets. An ensemble of classifiers is formed with the classifiers being used to classify data items in each feature subset. The said feature subsets are regrouped together iteratively depending on penalty factor also known as the weight factor applied based on the degree of misclassification in the feature subsets. The class label of data items in the complete data set is computed by aggregating the individual classification outcomes at each feature subset [22] [23]. A hybrid method is being proposed in this paper which makes use of ensemble learning from RF classification and boosting algorithm and SVM classification method. The processed seed plant data is divided randomly into feature subsets. SVM classification method is used to derive the output at each feature subset. Boosting learning method is applied so as to boost the classification adeptness at every feature subset. Later majority voting mechanism is applied to arrive at the final classification result of the original complete data set. Our next section describes Background Knowledge about Random Forest classifier, SVM and Boosting. In section 3 proposed methodology has been discussed. Performance analysis is discussed in Section 4. Section 5 concludes this work and later acknowledgement is given to the data source followed by references. # II. Background Knowledge a) Overview of SVM Classifier Support vector machine (SVM) is a statistical tool used in various data mining methodologies like classification and regression analysis. The data can be present either in the form of a multi class or two class problem. In this paper we shall be dealing with a two class problem wherein the seed plant data sets need to be categorized under two class labels one having data sets belonging to North America and the other having data sets belonging to South America. It has been applied in various areas like species distribution, locating mineral prospective areas etc..It has become popular for solving problems in regression and classification, consists of statistical learning theory based heuristic algorithms. The advantage with SVM is that the classification model can be built using minimal number of attributes which is not the case with most other classification methods [24]. In this paper we shall be proposing a hybrid classification methodology to classify seed plant data which would lead to improving the efficiency and accuracy of the traditional classification approach. The seed plant data sets used in the paper have data sets with known class labels. A classification model is constructed using the data sets which can be authenticated against a test data set and can later be used to predict class labels of unlabeled data sets. Since class labels of data sets are known apriori this approach is categorized as supervised classification. In unsupervised classification method also known as clustering the class label details is not known in advance. Each data vector in the data set used for classification comprises of unique attributes which is used to build the classification model [25] [19]. The SVM model can be SVM is represented by a separating hyper plane f (x) that geometrically bisects the data space thus dividing it into two diverse regions thus resulting in classification of the input data space into two categories. Figure 1 : The Hyperplane The function f(x) denotes the hyperplane that separates the two regions and facilitates in classification of the data set. The two regions geometrically created by the hyperplane correspond to the two categories of data under two class labels. A data point x n belongs to either of the region depending on the value of f(x n ). If f(x n ) > 0 it belongs to one region and if f(x n ) < 0 it belongs to another region. There are many such hyperplanes which can split the data into two regions. But SVM ensures that it selects the hyperplane that is at a maximum distance from the nearest data points in the two regions. There are only few hyperplanes that shall satisfy this criterion. By ensuring this condition SVM provides accurate classification results [27]. SVM's can be represented mathematically as well. Assume that the input data consists of n data vectors where each data vector is represented by x i ? R n , where i (=1, 2, ?.., n). Let the class label that needs to be assigned to the data vectors to implement supervised classification be denoted by y i , which is +1 for one category of data vectors and -1 for the other category of data vectors. The data set can be geometrically separated by a hyperplane. Since the hyperplane is represented by a line it can also be mathematically represented by [8][3] [28]: mx i + b >= +1 mx i + b <= -1(1) The hyperplane can also be represented mathematically by [31][32] [33]: f(x)= sgn(mx+ b) = sgn((? ? n i=1 i y i x i ). x + b) (2) where sgn() is known as a sign function, which is mathematically represented by the following equation: sgn(x)=? 1 if x > 0 0 if x = 0 ?1 if x < 0 (3) The data vectors are said to be optimally divided by the hyperplane if the distance amid the adjoining data vectors in the two different regions from the given hyperplane is maximum. This concept can be illustrated geometrically as in Figure 2, where the distance between the adjoining data points close to the hyperplane and the hyperplane is displayed [29][30] [28]. This hyperplane which has maximum distance d from adjoining points is computed to implement the said classification. This SVM can be represented as a primal formulation given by the equation [8][5] [31]: h(m)= 1 2 ||m|| 2 + Training error (5) subject to y i (mx i + b) >=1,?i The idea is to increase the margin and reduce the training error. The data sample records in the training data set belong to input set. Each of the data vectors have precise attributes based on which the classification model is built. These set of attributes are said to form a feature space. The kernel function bridges the gap between the feature space and the input space and enables to carry out classification on input space rather than complicated feature space. [29]. In this paper we have used Gaussian radial basis functions (RBF). SVM's make use of the radial basis kernel function to be able to work at the simpler input space level. The RBF kernel used is represented mathematically by [3][29]: can be solved using various methods. One method is to move the data vectors to a different space thereby making the problem linear. The other method is to split the multi class problem into numerous two class problems and later with a voting mechanism combine the solutions of individual two class problems to get the solution of the original multi class problem. [8]. K(x1,x2)=exp( |x 1 ?x 2 | 2 2? 2 )(6) The steps followed while using SVM in classifying data are mentioned in the below algorithm [16]: - ------------------------------------------------- ------------------------------------------------- --------------------------------------------------- In RF classification method the input data set is first subdivided into two subsets, one containing two thirds of the data points and the other containing the remaining one third. Classification tree models are constructed using the subset comprising of two thirds of data points The subset which contains one third data of data points which are not used at any given point of time to construct classification trees and are used for validation are called out of bag(OOB) data samples of the trees. There is no truncation applied at every classification tree. Hence every classification tree used in RF classification method is maximal in nature. Later RF classification method follows a majority voting process wherein classification output of every classification tree casts a vote to decide the final outcome of the ensemble classifier ie.. assigning a class label to a data item x [21]. The set of features are used to create a classification tree model at every randomly chosen subset [37]. This set of features shall remain constant throughout the growing of random forest. In RF, the test set is used to authenticate the classification results and also used for predicting the class labels for unlabeled data after the classification model is built. It also helps in cross validation of results among different classification results provided by various classification trees in the ensemble. To perform the said cross validation the out of bag(OOB) samples are used.. The individual classification tree outcomes are aggregated with a majority vote and the cumulative result of the whole ensemble shall be more accurate and prone to lesser classification error than individual classification tree results [26]. Every classification tree in the random forest ensemble is formed using the randomly selected two thirds of input variables, hence there is little connection between different trees in the forest. One can also restrict the number of variables that split a parent node in a classification tree resulting in the reduction of connection between classification trees. The Random forest classification method works better even for larger data sets. This is not the case with other ensemble methods [1] [2]. In this paper we shall be using the both boosting and random forest ensemble classification methods along with support vector machines to give a more accurate classification output. This hybrid method shall be more robust to noise as compared to individual classification method. RF classification method works with both discreet and continuous variables which is not the case with other statistical classification modeling methods. Furthermore, there is no limit on the total number of classification trees that are generated in the ensemble process and the total number of variable or data samples(generally two thirds are used) in every random subset used to build the classification trees [36]. RF rates variables based on the classification accuracy of the said variable relative to other variables in the data set. This rank is also known as importance index. It reflects the relative importance of every variable in the process of classification. The importance index of a variable is calculated by averaging the importance of the variable across classification trees generated in the ensemble. The more the value of this importance index, the greater is a variables importance for classification. Another parameter obtained by dividing the variable's importance index by standard error is called z-score. Both importance index as well as z-score play a significant role in ensuring the efficiency of the classification process [25][36][39] [38]. The importance of a variable can also be assessed by using two parameters, Gini Index decrease and OOB error estimation. Herein relative importance of variables are calculated which is beneficial in studies wherein the numbers of attributes are very high and thus leading to relative importance gaining prominence [40]. # Global Journal of Computer Science and Technology Volume XIV Issue I Version I 46 ( D D D D ) Year C 2014 ? ? ? k(C i ,X) |X| ? . ? k(C j ,X) |X| ? j?i (7) where k(C i ,X) |X| is the is the probability that a selected case belongsto class C i . RF method provides precise results with respect to variation and bias [39].. The performance of the RF classification method is better compared to other classifiers like support vector machines, Neural Networks and discriminant analysis. In this paper a hybrid classification method coalescing the advantages of both Random forest and Support vector machines in addition to boosting is used. The RF algorithm is becoming gradually popular with applications like forest classification, credit rate analysis, remote sensing image analysis, intrusion detection etc. Yet another parameter that can contribute in assessing the classification is proximity measure of two samples. The proximity measure is the number of classification trees in which two data samples end up in the same node. This parameter when divided by the number of classification trees generated can facilitate in detecting outliers in the data sets. This computation requires large amount of memory space, depending on the total number of sample records and classification trees in the ensemble [1]. The pseudo code for Random Forest algorithm is mentioned below [42]: - --------------------------------------Random Forest Algorithm: --------------------------------------Input: D: training sample a: number of input instance to be used to generate classification tree T: total number of classification trees in random forest OT: Classification Output from each tree T 1) OT is empty 2) for i=1 to T 3) Db = Form random sample subsets after selecting 2/3rd instances randomly from D /* For every tree this sample would be randomly selected*/ 4) Cb = Build classification trees using random subsets Db 5) Validate the classifier Cb using remaining 1/3rd instances //Refer Step 3. 6) OT=store classification outputs of classification trees 7) next i 8) Apply voting mechanism to derive output ORT of the Random forest(ensemble of classification trees) 9) return ORT ---------------------------------------c) Overview of Boosting Ensemble learning is a process wherein a data set is divided into subsets. Individual learners are then used to classify and build the model for each of these subsets. Later the individual learning models are combined so as to determine the final classification model of the complete data set. As the complex large data set is divided into smaller random subsets and classification model is applied on these smaller subsets the said process of ensemble learning results in improving classification efficiency and gives more accurate results. Numerous classification methodologies like bagging, boosting etc...can also be used in learning by constructing an ensemble [43][44] [45]. In this research paper boosting method has been used to create the said ensemble. It works by rewarding successful classifiers and by applying penalties to unsuccessful classifiers. In the past it has been used in various applications like machine translation [46], intrusion detection [47], forest tree regression, natural language processing, unknown word recognition [48] etc. Boosting is applied to varied types of classification problems. It is an iterative process wherein the training data set is regrouped together into subsets and various classifiers are used to classify data samples in the subsets. The data samples which were difficult to classify by a classifier also known as a weak learner at one stage are classified using new classifiers that get added to the ensemble at a later stage [49][50] [51]. In this way at each stage a new classifier gets augmented to the ensemble. The difficulty in classifying a data item Xi at stage k is represented by a weight factor Wk(i). The regrouping of training sets at each step of learning is done depending on the weight factor Wk(i) [22]. The value of the weight factor is proportional to the misclassification of the data. This way of forming regrouped data samples at every stage depending on the weight factor is called re-sampling version of boosting. Yet another way of implementing boosting is by reweighting wherein weight factor is assigned iteratively to every data item in the data set and the complete data set is used at every subsequent iteration by modifying the weights at every stage [48] [52]. The most popular boosting algorithm called Adaboost [23]. Adaboost stands for Adaptive Boosting. It adapts or updates weights of the data items based on misclassification of training samples due to weak learners and regroups the data subsets depending on the new weights. The steps of Adaboost algorithm is mentioned below: - end for ---------------------------------------------------In the next section the proposed hybrid methodology is discussed in detail. ------------------------------------------------- Adaboost Algorithm -------------------------------------------------- # III. # Proposed Methodology In this paper we shall construct a hybrid classification model which shall facilitate in predicting the class label of seed plant data from test data sets. The methodology recommended has been denoted as a schematic diagram as mentioned in Fig 3 and the detailed explanation of the steps followed has been given in the following subsections. The data sets are randomly divided into n different random subsets each subset comprising of two third of the whole data set. Classification methods are applied to each of these random subsets. The remaining one third data sets at each subsets is used as a test set. At each random subset the following attributes were used so as to implement the classification method discussed in the next subsection: id, continent, specificEpithet and churn. Now churn is a variable that is set to yes if the seed plant data belongs to North America or if it belongs to South America it is set to no. # d) Selection of an appropriate classification method In this paper seed plant data sets are classified using a hybrid classification method which makes use of Random forest, SVM classifier and boosting ensemble learning method. In the hybrid methodology the input data set is randomly subdivided into subsets. Each data item in each of the subset has a weight factor associated with it. The data items in the subsets are classified by SVM classifier. If a misclassification has occurred then the weight factor of the data items is increased otherwise it is reduced. The data subsets are rearranged and again SVM classifier is used to perform classification at each subset. The weights are again updated depending on whether it is a proper classification or a misclassification. These steps are iteratively repeated till all the weights get updated to a very low value. The output of the input data set is computed by applying voting mechanism to all the random subsets classification outputs [34]. The algorithm for the proposed hybrid methodology is givenin the sample code herein: -----------------------------------------------------Algorithm 1 Hybrid classification using RF and SVM supplemented by boosting - --------------------------------------------------- The obtained classification output at each random subset is validated by using the hybrid classifier model to test against the complete data set. In this paper 10 random feature subsets were used and at every subset SVM classifier was used to perform the said classification. Voting mechanism was then applied to derive the final classification output. In this paper a total of 180 support vectors were used. IV. # Performance Analysis a) Environment Setting The study area included is from North and South America. It includes data pertaining to localities wherein seed plant species are present. A total of 599 data set records from North American region and a total of 401data set records from South American region are analyzed in order to execute the proposed method. Sample records used in this paper are shown in Table I It is observed that the most conventionally utilized evaluation metrics in classification are accuracy, specificity, positive predictive value and negative predictive value. The formulae for accuracy, specificity, prevalence and negative predictive value are provided by equations ( 8), ( 9), ( 10) and ( 11 The confusion matrix or error matrix view for SVM Classifier is given in Table V and for RF Classifier in Table VI. Performance Measures using evaluation metrics are specified in Fig 5 which are calculated using equations ( 8), ( 9), ( 10)and (11). # Conclusion In this paper hybrid classifier based on random forest, SVM and boosting methods is used to classify seed plant data. The hybrid classification results are compared with the results attained by implementing classification using traditional SVM and RF classifiers. The research has established that the hybrid approach of classification is more efficient as compared to traditional SVM and RF classifiers since it gives higher values of accuracy, specificity, positive predictive value and negative predictive value. The reason for better results in the case of hybrid classification methodology used in this paper is since it makes use of the advantages of each of the individual traditional SVM, RF classifications methods. Furthermore, the classification results are supplemented using boosting ensemble classification method. In the future the proposed method can be used so as to classify vector, raster remote sensed data that can be collected via satellites and various geographical information systems. # VI. 2![Figure 2 : Distance of the nearest data vectors from the Hyperplane The distance dof a data point x from the hyperplane is represented mathematically by the equation: d= |(m,x) + b| |m|](image-2.png "Figure 2 :") ![SVM selects a local Gaussian function and later the global Gaussian function is computed by aggregating all the local Gaussian function.SVM can be used to solve either two class or multi-class problems. Multiclass classification problems (](image-3.png "") 2![Classification using linear kernel based SVM Classifier -](image-4.png "- Algorithm 2") ![b) Overview of Random Forest Classifier Ensemble learning algorithms use an ensemble or a group of classifiers to classify data. Hence they give more accurate results as compared to individual classifiers. Random forest classifier is an example for ensemble classifiers. Random forests make use of an ensemble of classification trees [34][35][36][37][38] [41].](image-5.png "") ![learners also called weak learners if misclassification occurs Set example weights based on predictions by ensemble of classifiers.](image-6.png "") 3![Figure 3 : Proposed Model a) Selection of Remote sensed data The data sets collected in this paper belong to various types of seed plant family viz Pinopsida, Dicotyledoneae, Monocotyledoneae, Cycadopsida, Pinopsida, Gnetopsida, Lycopodiopsida, Agaricomycetes and Marchantiopsida. b) Data Inspection The seed plant data sets are pre-processed and any missing values or duplicate values are eliminated by either ignoring tuples comprising of duplicate values or by manually entering values or replacing with a global constant or a mean value into tuples with missing values[53]. c) Feature subset Selection](image-7.png "Figure 3 :") 1decimalLatitdecimalLongispecificEpichuidhigherGeographycontinent familyscientificNameudetudegenusthetrn2759NorthAmerica,NorthLycoperdacCalvati86GREENLANDAmericaeaeCalvatiaarctica72-40aarcticayes3333NorthEmpetrumeamesii FernaldEmpetr01North America,AmericaEricaceae&Wiegand52-56umeamesiiyes2717NorthRanunculaThalictrumterrae-novaeThalictr58North America,AmericaceaeGreene52-56umterrae-novaeyesA 2ItemCapacityCPUIntel CPU G645 @2.9 GHz processorMemory8GB RAMOSWindows 7 64-bitToolsR, R Studiob) Result AnalysisClassification of the spatial data sets can be represented as a confusion or error matrix view as shown in III 3Real groupClassification resultNorth AmericaSouth AmericaNorth America True Negative(TN) False Positive(FP) South America False Negative(FN) True Positive(TP) 5PredictionReference South America North AmericaSouth America 87North America 4916 6PredictionReference South America North AmericaSouth America 3611North America 2112 © 2014 Global Journals Inc. (US) © 2014 Global Journals Inc. (US)--------------------------------------------------- ## Acknowledgment We direct our frank appreciativeness to the Field Museum of Natural History(Botany)-Seed Plant Collection (accessed through GBIF data portal, http://data.gbif.org/datasets/resource/14346,2013-06-03) for providing us with different seed plant data sets. We also thank ANU university for providing all the support in the work conducted. * Random Forests for land cover classification JonAtliPall Oskar Gislason JohannesRBenediktsson Sveinsson 10.1016/j.patrec.2005.08.011 Pattern Recognition Letters 0167-8655 27 4 March 2006 * An assessment of the effectiveness of a random forest classifier for land-cover classification VFRodriguez-Galiano BGhimire JRogan MChica-Olmo JPRigol-Sanchez 10.1016/j.isprsjprs.2011.11.002 ISPRS Journal of Photogrammetry and Remote Sensing 67 January 2012 Pages 93-104, ISSN0924-2716 * Multiple support vector machines for land cover change detection: An application for mapping urban extensions HassibaNemmour YoucefChibani 10.1016/j.isprsjprs.2006.09.004 ISPRS Journal of Photogrammetry and Remote Sensing 61 2 November 2006 Pages 125-133,ISSN0924-2716 * Hybrid Bayesian network classifiers: Application to species distribution models PAAguilera AFernández FReche RRumí 10.1016/j.envsoft.2010.04.016 Environmental Modelling & Software, Volume25, Issue12 December 2010 * TillRumpf ChristophRömer MartinWeis MarkusSökefeld RolandGerhards LutzPlümer Sequential Pages89-96, ISSN0168-1699 * XiuliSi, Fish species classification by color, texture and multi-class support vector machine using computer vision, Computers and Electronics in Agriculture JingHu DaoliangLi QinglingDuan YueqiHan GuifenChen 10.1016/j.compag.2012.07.008 October 2012 88 Pages 133-140,ISSN0168-1699 * Classification of savanna tree species, in the Greater Kruger National Park region, by integrating hyperspectral and LiDAR data in a Random Forest data mining environment LNaidoo MACho RMathieu GAsner 10.1016/j.isprsjprs.2012.03.005 ISPRS Journal of Photogrammetry and Remote Sensing 69 April 2012 * Support vector machine for multiclassification of mineral prospectivity areas MaysamAbedi Gholam-Hossain AbbasNorouzi Bahroudi 10.1016/j.cageo.2011.12.014 Computers & Geosciences 0098-004 46 September2012 * Effective scheduling strategies for boosting performance on rule-based spam filtering frameworks DRuano-Ordás JFdez-Glez FFdez-Riverola JRMéndez 10.1016/j.jss.2013.07.036 Journal of Systems and Software 0164-1212 Volume86,Issue12, December2013,Pages 3151-3161 * N AbdulRahim MPPaulraj AHAdom 10.1016/j.proeng.2013.02.054 Adaptive Boosting with SVM Classifier for Moving Vehicle Classification, Procedia Engineering 2013 53 * Oil spill feature selection and classification using decision tree forest on SAR image data KonstantinosTopouzelis ApostolosPsyllos 10.1016/j.isprsjprs.2012.01.005 ISPRS Journal of Photogrammetry and Remote Sensing 0924-2716 68 March 2012 * Comparison of support vector machine, neural network, and CART algorithms for the land-cover classification using limited training data points, ISPRS Journal of Photogrammetry and Remote Sensing, Volume70 YangShao RossSLunetta 10.1016/j.isprsjprs.2012.04.001 June2012, Pages78-87,ISSN0924-2716 * Support vector machine under uncertainty: An application for hydroacoustic classification of fish-schools in Chile PaulBosch JulioLópez HéctorRamírez HugoRobotham 10.1016/j.eswa.2013.01.006 Expert Systems with Applications 40 10 August 2013 Pages 4029-4034,ISSN0957-4174 * Study on Recognition of Bird Species in Minjiang River Estuary Wetland HongjiLin HanLin WeibinChen 10.1016/j.proenv.2011.09.386 Procedia Environmental Sciences 1878-0296 2011 * Predicting the potential habitat of oaks with data mining models andtheR system RafaelPino-Mejías MaríaDoloresCubiles-De-La-Vega MaríaAnaya-Romero AntonioPascual-Acosta AntonioJordán-López 10.1016/j.envsoft.2010.01.004 Environmental Modelling & Software July 2010 25 Nicolás Bellinfante-Crocci * Efficient Classification Algorithms using SVMs for Large Datasets, A Project Report Submitted in partial fulfillment of the requirements for the Degree of Master of Technology in Computational Science SNJeyanthi June 2007 IISC, BANGALORE, INDIA Supercomputer Education and Research Center * Analysis of Parametric & Non Parametric Classifiers for Classification Technique using WEKA GYugal Kumar Sahoo I.J. Information Technology and Computer Science 7 2012 * 10.5815/ijitcs.2012.07.06 July 2012 Published Online * A hybrid SVM based decision tree, Pattern Recognition MKumar M 10.1016/j.patcog.2010.06.010 December 2010 43 * Magnetic resonance brain images classification using linear kernel based Support Vector Machine NRajasekhar SJBabu TVRajinikanth 2012Nirma University International Conference on 5 68 Engineering (NUiCONE) * Dec 2012doi10.1109/NUICONE.2012.6493213 * Incorporating Spatial Variability Measures in Land-cover Classification using Random Forest VFRodríguez-Galiano FAbarca-Hernández BGhimire MChica-Olmo PMAkinson CJeganathan 10.1016/j.proenv.2011.02.009 Procedia Environmental Sciences 1878- 0296 Volume3,2011 * A hybrid network intrusion detection framework based on random forests and weighted k-means RedaMElbasiony ElsayedASallam TarekEEltobely MahmoudMFahmy 10.1016/j.asej.2013.01003 A in Shams Engineering Journal 2090-4479 Available online 7 March 2013 * Using boosting to prune bagging ensembles GonzaloMartínez-Muñoz AlbertoSuárez 10.1016/j.patrec.2006.06.018 Pattern Recognition Letters 28 1 January 2007 Pages 156-165, ISSN0167-8655 * An efficient modified boosting method for solving classification problems Chun-XiaZhang Jiang-SheZhang Gai-YingZhang 10.1016/j.cam.2007.03.003 Journal of Computational and Applied Mathematics 0377- 0427 214 2 1 May 2008 * Impact of feature selection on the accuracy and spatial uncertainty of per-field crop classification using Support Vector Machines, ISPRS Journal of Photogrammetry and Remote Sensing, Volume85 FLöw UMichel SDech CConrad 10.1016/j.isprsjprs.2013.08.007 November 2013 * Comparison of random forest, support vector machine and back propagation neural network for electronic tongue data classification: Application to the recognition of orange beverage and Chinese vinegar MingjunLiu JunWang DuoWang Li 10.1016/j.snb.2012.11.071 Sensors and Actuators B: Chemical 0925-4005 177 February2013 * Going-concern prediction using hybrid random forests and rough set approach Ching-ChiangYeh Der-JangChi Yi-RongLin 10.1016/j.ins.2013.07.011 Information Sciences 0020-0255 August 2013 * A support vector machine approach to CMOS-based radar signal processing for vehicle classification and speed estimation Hsun-JungCho Ming-Tetseng 10.1016/j.mcm.2012.11.003 Mathematical and Computer Modelling 58 2 July 2013 Pages 438-448,ISSN0895-7177 * Oil and gas pipeline failure prediction system using long range ultrasonic transducers and Euclidean-Support Vector Machines classification approach, Expert Systems with Applications, Volume40, Issue6 RajprasadLam Hong Lee LaiHungRajkumar ChinLo DinoHeng Wan Isa 10.1016/j.eswa.2012.10.006 May 2013, Pages1925-1934 * Support Vector Machines The Interface to lib svm in package e1071 DavidMeyer September, 2012 Austria Technische University of Wien * The one-against-all partition based binary tree support vector machine algorithms for multi-class classification XiaoweiYang QiaozhenYu LifangHe TengjiaoGuo 10.1016/ j.neucom.2012.12.048 Neurocomputing 0925-2312 113 1-7 3 August 2013 * Support Vector Machines for Classification and Regression SteveRGunn May1998 Faculty of Engineering, Science and Mathematics School of Electronics and Computer Science, University Of South Hampton Technical Report * Convex and concave hulls for classification with support vector machine, Neuro computing Asdrúbal LópezChau XiaoouLi WenYu 122 25 * 10.1016/j.neucom.2013.05.040 Pages 198-209,I SSN 0925-2312 December 2013 * Structural twin parametric-margin support vector machine for binary classification, Knowledge-Based Systems XinjunPeng YifeiWang DongXu 10.1016/j.knosys.2013.04.013 September2013 49 * Churn prediction in telecom using Random Forest and PSO based data balancing in combination with various feature selection strategies AdnanIdris MuhammadRizwan AsifullahKhan 10.1016/j.compeleceng.2012.09.001 Pages 1808-1819, ISSN0045-7906 Novem ber 2012 38 * Forest classification trees and forest support vector machines algorithms: Demonstration using microarray data EliasZintzaras AxelKowald 10.1016/j.compbiomed.2010.03.006 Computers in Biology and Medicine Pages 519-524, ISSN0010-4825 * A hybrid KMV model, random forests and rough set theory approach for credit rating, Knowledge-Based Systems Ching-HiangYeh Chih-YuFengyilin Hsu 10.1016/j.knosys.2012.04.004 September 2012 33 Pages 166-172, ISSN0950-7051 * imageRF -A user-oriented implementation for remote sensing image analysis with Random Forests SebastianBjörnwaske CarstenVander Linden BenjaminOldenburg Jakimow 10.1016/j.envsoft.2012.01.014 Environmental Modeling & Software Andreas Rabe, Patrick Hostert July 2012, Pages192-193 35 * Comparison of random forest, support vector machine and back propagation neural network for electronic tongue data classification: Application to the recognition of orange beverage and Chinese vinegar MiaoLiu MingjunWang DuoJunwang Li 10.1016/j.snb.2012.11.071 Pages 970-980, ISSN0925-4005 February 2013 177 * Variable selection using r random forests RobinGenuer Jean-MichelPoggi ChristineTuleau-Malot 10.1016/j.patrec.2010.03.014 Pattern Recognition Letters 31 14 15 October 2010 Pages 2225-2236, ISSN0167-8655 * Alexander Hammers, Daniel Rueckert, for the Alzheimer's Disease Neuroimaging Initiative, Random forest-based similarity measures for multimodal classification of Alzheimer's disease, NeuroImage KatherineRGray PaulAljabar RolfAHeckemann 10.1016/j.neuroimage.2012.09.065 Pages 167-175, ISSN10538119 15 January 2013 65 * Sleep stages classification based on heart rate variability and random forest MengXiao HongYan JinzhongSong YuzhouYang XianglinYang 10.1016/j.bspc.2013.06.001 Biomedical Signal Processing and Control 8 November 2013 Pages 624-633, ISSN1746-8094 * Random forests ensemble classifier trained with data resampling strategy to improve cardiac arrhythmia diagnosis AkinÖzçift 10.1016/j.compbiomed.2011.03.001 Computers in Biology and Medicine 41 5 May 2011 Pages 265-271 ,ISSN0010-4825 * Improving nonparametric regression methods by bagging and boosting SimoneBorra AgostinoDi Ciaccio 10.1016/S0167-9473(01 Computational Statistics &Data Analysis Volume38, Issue4, 28February2002. Pages 407-420,ISSN01679473 * Boosting and combination of classifiers for natural language call routing systems ImedZitouni Hong-Kwang JeffKuo Chin-HuiLee 10.1016/0167-6393(03 Speech Comm unication 41 4 November 2003 Pages 647-661, ISSN0167-6393 * Boosting for Multiclass Semi-Supervised Learning JafarTanha MaartenvanSomeren HamidehAfsarmanesh 10.1016/j.patrec.2013.10.008 Pattern Recognition Letters 21 October 2013 * Bagging and Boosting statistical machine translation systems TongXiao JingboZhu TongranLiu 10.1016/j.artint.2012.11.005 Artificial Intelligence 195 February 2013 Pages 496-527, ISSN0004-3702 * Intrusion detection by integrating boosting genetic fuzzy classifier and data mining criteria for rule prescreening TanselÖzyer RedaAlhajj KenBarker 10.1016/j.jnca.n2005.06.002 Journal of Network and Computer Applications 30 1 January 2007 Pages 99-113, ISSN1084-8045 * Boosting-based ensemble learning with penalty profiles for automatic Thai unknown word recognition JakkritTecho CholwichNattee ThanarukTheeramunkong 10.1016/j.camwa.2011.11.062 Computers &Mathematics with Applications 63 6 March 2012, Pages 1117-1134, ISSN08981221 * A local boosting algorithm for solving classification problems Chun-XiaZhang Jiang-SheZhang 10.1016/j.csda.2007.06.015 Computational Statistics & Data Analysis 0167-9473 52 4 10 January 2008. 1928-1941 * Plot-level Forest Volume Estimation Using Airborne Laser Scanner and TM Data, Comparison of Boosting and Random Forest Tree Regression Algorithms ShabanShataeea HolgerWeinaker ManoucherBabanejad 10.1016/j.proenv.2011.07.013 Procedia Environmental Sciences 1878- 0296 7 2011 * QBoost: Predicting quantiles with boosting for regression and binary classification ZhengSong Feng 10.1016/j.eswa.2011.06.060 Expert Systems with Applications 0957- 4174 39 2 1 February 2012 * An experimental study on diversity for bagging and boosting with linear classifiers, Information Fusion LIKuncheva MSkurichina RP WDuin 10.1016/S1566-2535(02 December 2002 3 * A survey of image classification methods and techniques for improving classification performance DLu & QWeng 10.1080/01431160600746456 International Journal of Remote Sensing 28 5 2007 * Team}{r Core R:A Language and Environment for Statistical Computing, R Foundation for Statistical Computing Vienna,Austria 2013