# Introduction

any organizations maintain huge data repositories which store data collected from various sources in different formats. The said data repositories are also known as data warehouses. One of the prominent sources of data is remote sensed data collected via satellites or geographical information systems software's [1].

The data thus collected can be of use in various applications including and not restricted to land use [2] [3], species distribution modeling [4] [5] [6]  [7], mineral resource identification [8], traffic analysis [10], network analysis [9] and environmental monitoring systems [11]  [12]. Data mining is used to extract information from the said data repositories. The information thus mined can help various stakeholders in an organization in taking strategic decisions. Data can be mined from the data repositories using various methodologies like anomaly detection, supervised classification, clustering, association rule learning, regression, characterization and summarization and sequential pattern mining. In this paper we shall be applying a hybrid classification technique to classify plant seed remote sensed data.

A lot of research has been undertaken to classify plant functional groups, fish species, bird species etc... [7][13] [14].The classification of various species shall help in conserving the ecosystem by facilitating ins predicting of endangered species distribution [15]. It can also help in identifying various resources like minerals, water resources and economically useful trees. Various technologies in this regard have been developed. Machine learning methods, image processing algorithms, geographical information systems tools etc..have added to the development of numerous systems that can contribute to the study of spatial data and can mine relevant information which can be of use in various applications. The systems developed can help constructing classification models that in turn facilitate in weather forecasting, crop yield classification, mineral resource identification, soil composition analysis and also locating water bodies near to the agricultural land.

Classification is the process wherein a class label is assigned to unlabeled data vectors. It can be categorized into supervised and un-supervised classification which is also known as clustering. In supervised classification learning is done with the help of supervisor ie. learning through example. In this method the set of possible class labels is known apriori to the end user. Supervised classification can be subdivided into non-parametric and parametric classification. Parametric classifier method is dependent on the probability distribution of each class. Non  without supervisor ie. learning from observations. In this method set of possible classes is not known to the end user. After classification one can try to assign a name to that class. Examples of un-supervised classification methods are Adaptive resonance theory(ART) 1, ART 2,ART 3, Iterative Self-Organizing Data Analysis Method, K-Means, Bootstrapping Local, Fuzzy C-Means, and Genetic Algorithm [17]. In this paper we shall discuss about a hybrid classification method. The said hybrid method will make use of support vector machine(SVM) classification, random forest and boosting methods. Later its performance is evaluated against traditional individual random forest classifiers and support vector machines.

A powerful statistical tool used to perform supervised classification is Support Vector machines. Herein the data vectors are represented in a feature space. Later a geometric hyperplane is constructed in the feature space which divides the space comprising of data vectors into two regions such that the data items get classified under two different class labels corresponding to the two different regions. It helps in solving equally two class and multi class classification problem. The aim of the said hyper plane is to maximize its distance from the adjoining data points in the two regions. Moreover, SVM's do not have an additional overhead of feature extraction since it is part of its own architecture. Latest research have proved that SVM classifiers provide better classification results when one uses spatial data sets as compared to other classification algorithms like Bayesian method, neural networks and k-nearest neighbors classification methods [18] [19].

In Random forest(RF) classification method many classifiers are generated from smaller subsets of the input data and later their individual results are aggregated based on a voting mechanism to generate the desired output of the input data set. This ensemble learning strategy has recently become very popular. Before RF, Boosting and Bagging were the only two ensemble learning methods used. RF can be applied for supervised classification, unsupervised learning and regression. RF has been extensively applied in various areas including modern drug discovery, network intrusion detection, land cover analysis, credit rating analysis, remote sensing and gene microarrays data analysis etc... [20][21].

Other popular ensemble classification methods are bagging and boosting. Herein the complex data set is divided into smaller feature subsets. An ensemble of classifiers is formed with the classifiers being used to classify data items in each feature subset. The said feature subsets are regrouped together iteratively depending on penalty factor also known as the weight factor applied based on the degree of misclassification in the feature subsets. The class label of data items in the complete data set is computed by aggregating the individual classification outcomes at each feature subset [22] [23].

A hybrid method is being proposed in this paper which makes use of ensemble learning from RF classification and boosting algorithm and SVM classification method. The processed seed plant data is divided randomly into feature subsets. SVM classification method is used to derive the output at each feature subset. Boosting learning method is applied so as to boost the classification adeptness at every feature subset. Later majority voting mechanism is applied to arrive at the final classification result of the original complete data set.

Our next section describes Background Knowledge about Random Forest classifier, SVM and Boosting. In section 3 proposed methodology has been discussed. Performance analysis is discussed in Section 4. Section 5 concludes this work and later acknowledgement is given to the data source followed by references.


# II.

Background Knowledge a) Overview of SVM Classifier Support vector machine (SVM) is a statistical tool used in various data mining methodologies like classification and regression analysis. The data can be present either in the form of a multi class or two class problem. In this paper we shall be dealing with a two class problem wherein the seed plant data sets need to be categorized under two class labels one having data sets belonging to North America and the other having data sets belonging to South America. It has been applied in various areas like species distribution, locating mineral prospective areas etc..It has become popular for solving problems in regression and classification, consists of statistical learning theory based heuristic algorithms. The advantage with SVM is that the classification model can be built using minimal number of attributes which is not the case with most other classification methods [24]. In this paper we shall be proposing a hybrid classification methodology to classify seed plant data which would lead to improving the efficiency and accuracy of the traditional classification approach.

The seed plant data sets used in the paper have data sets with known class labels. A classification model is constructed using the data sets which can be authenticated against a test data set and can later be used to predict class labels of unlabeled data sets. Since class labels of data sets are known apriori this approach is categorized as supervised classification. In unsupervised classification method also known as clustering the class label details is not known in advance. Each data vector in the data set used for classification comprises of unique attributes which is used to build the classification model [25] [19]. The SVM model can be SVM is represented by a separating hyper plane f (x) that geometrically bisects the data space thus dividing it into two diverse regions thus resulting in classification of the input data space into two categories.

Figure 1 : The Hyperplane The function f(x) denotes the hyperplane that separates the two regions and facilitates in classification of the data set. The two regions geometrically created by the hyperplane correspond to the two categories of data under two class labels. A data point x n belongs to either of the region depending on the value of f(x n ). If f(x n ) > 0 it belongs to one region and if f(x n ) < 0 it belongs to another region. There are many such hyperplanes which can split the data into two regions. But SVM ensures that it selects the hyperplane that is at a maximum distance from the nearest data points in the two regions. There are only few hyperplanes that shall satisfy this criterion. By ensuring this condition SVM provides accurate classification results [27].

SVM's can be represented mathematically as well. Assume that the input data consists of n data vectors where each data vector is represented by x i ? R n , where i (=1, 2, ?.., n). Let the class label that needs to be assigned to the data vectors to implement supervised classification be denoted by y i , which is +1 for one category of data vectors and -1 for the other category of data vectors. The data set can be geometrically separated by a hyperplane. Since the hyperplane is represented by a line it can also be mathematically represented by [8][3] [28]:
mx i + b >= +1 mx i + b <= -1(1)
The hyperplane can also be represented mathematically by [31][32] [33]:
f(x)= sgn(mx+ b) = sgn((? ? n i=1 i y i x i ). x + b) (2)
where sgn() is known as a sign function, which is mathematically represented by the following equation:
sgn(x)=? 1 if x > 0 0 if x = 0 ?1 if x < 0 (3)
The data vectors are said to be optimally divided by the hyperplane if the distance amid the adjoining data vectors in the two different regions from the given hyperplane is maximum.

This concept can be illustrated geometrically as in Figure 2, where the distance between the adjoining data points close to the hyperplane and the hyperplane is displayed [29][30] [28]. 

This hyperplane which has maximum distance d from adjoining points is computed to implement the said classification. This SVM can be represented as a primal formulation given by the equation [8][5] [31]:
h(m)= 1 2 ||m|| 2 + Training error (5) subject to y i (mx i + b) >=1,?i
The idea is to increase the margin and reduce the training error. The data sample records in the training data set belong to input set. Each of the data vectors have precise attributes based on which the classification model is built. These set of attributes are said to form a feature space. The kernel function bridges the gap between the feature space and the input space and enables to carry out classification on input space rather than complicated feature space. [29].

In this paper we have used Gaussian radial basis functions (RBF). SVM's make use of the radial basis kernel function to be able to work at the simpler input space level. The RBF kernel used is represented mathematically by [3][29]: can be solved using various methods. One method is to move the data vectors to a different space thereby making the problem linear. The other method is to split the multi class problem into numerous two class problems and later with a voting mechanism combine the solutions of individual two class problems to get the solution of the original multi class problem. [8].
K(x1,x2)=exp( |x 1 ?x 2 | 2 2? 2 )(6)
The steps followed while using SVM in classifying data are mentioned in the below algorithm [16]:

- -------------------------------------------------  -------------------------------------------------  --------------------------------------------------- In RF classification method the input data set is first subdivided into two subsets, one containing two thirds of the data points and the other containing the remaining one third. Classification tree models are constructed using the subset comprising of two thirds of data points The subset which contains one third data of data points which are not used at any given point of time to construct classification trees and are used for validation are called out of bag(OOB) data samples of the trees. There is no truncation applied at every classification tree. Hence every classification tree used in RF classification method is maximal in nature. Later RF classification method follows a majority voting process wherein classification output of every classification tree casts a vote to decide the final outcome of the ensemble classifier ie.. assigning a class label to a data item x [21]. The set of features are used to create a classification tree model at every randomly chosen subset [37]. This set of features shall remain constant throughout the growing of random forest.

In RF, the test set is used to authenticate the classification results and also used for predicting the class labels for unlabeled data after the classification model is built. It also helps in cross validation of results among different classification results provided by various classification trees in the ensemble. To perform the said cross validation the out of bag(OOB) samples are used.. The individual classification tree outcomes are aggregated with a majority vote and the cumulative result of the whole ensemble shall be more accurate and prone to lesser classification error than individual classification tree results [26].

Every classification tree in the random forest ensemble is formed using the randomly selected two thirds of input variables, hence there is little connection between different trees in the forest. One can also restrict the number of variables that split a parent node in a classification tree resulting in the reduction of connection between classification trees. The Random forest classification method works better even for larger data sets. This is not the case with other ensemble methods [1] [2]. In this paper we shall be using the both boosting and random forest ensemble classification methods along with support vector machines to give a more accurate classification output. This hybrid method shall be more robust to noise as compared to individual classification method.

RF classification method works with both discreet and continuous variables which is not the case with other statistical classification modeling methods. Furthermore, there is no limit on the total number of classification trees that are generated in the ensemble process and the total number of variable or data samples(generally two thirds are used) in every random subset used to build the classification trees [36].

RF rates variables based on the classification accuracy of the said variable relative to other variables in the data set. This rank is also known as importance index. It reflects the relative importance of every variable in the process of classification. The importance index of a variable is calculated by averaging the importance of the variable across classification trees generated in the ensemble. The more the value of this importance index, the greater is a variables importance for classification. Another parameter obtained by dividing the variable's importance index by standard error is called z-score. Both importance index as well as z-score play a significant role in ensuring the efficiency of the classification process [25][36][39] [38].

The importance of a variable can also be assessed by using two parameters, Gini Index decrease and OOB error estimation. Herein relative importance of variables are calculated which is beneficial in studies wherein the numbers of attributes are very high and thus leading to relative importance gaining prominence [40].


# Global Journal of Computer Science and Technology

Volume XIV Issue I Version I
46 ( D D D D ) Year C 2014 ? ? ? k(C i ,X) |X| ? . ? k(C j ,X) |X| ? j?i (7)
where k(C i ,X) |X| is the is the probability that a selected case belongsto class C i .

RF method provides precise results with respect to variation and bias [39].. The performance of the RF classification method is better compared to other classifiers like support vector machines, Neural Networks and discriminant analysis. In this paper a hybrid classification method coalescing the advantages of both Random forest and Support vector machines in addition to boosting is used. The RF algorithm is becoming gradually popular with applications like forest classification, credit rate analysis, remote sensing image analysis, intrusion detection etc.

Yet another parameter that can contribute in assessing the classification is proximity measure of two samples. The proximity measure is the number of classification trees in which two data samples end up in the same node. This parameter when divided by the number of classification trees generated can facilitate in detecting outliers in the data sets. This computation requires large amount of memory space, depending on the total number of sample records and classification trees in the ensemble [1]. The pseudo code for Random Forest algorithm is mentioned below [42]:

- --------------------------------------Random Forest Algorithm:  --------------------------------------Input: D: training sample a: number of input instance to be used to generate classification tree T: total number of classification trees in random forest OT: Classification Output from each tree T 1) OT is empty 2) for i=1 to T 3) Db = Form random sample subsets after selecting 2/3rd instances randomly from D /* For every tree this sample would be randomly selected*/ 4) Cb = Build classification trees using random subsets Db 5) Validate the classifier Cb using remaining 1/3rd instances //Refer Step 3. 6) OT=store classification outputs of classification trees 7) next i 8) Apply voting mechanism to derive output ORT of the Random forest(ensemble of classification trees) 9) return ORT ---------------------------------------c) Overview of Boosting Ensemble learning is a process wherein a data set is divided into subsets. Individual learners are then used to classify and build the model for each of these subsets. Later the individual learning models are combined so as to determine the final classification model of the complete data set. As the complex large data set is divided into smaller random subsets and classification model is applied on these smaller subsets the said process of ensemble learning results in improving classification efficiency and gives more accurate results. Numerous classification methodologies like bagging, boosting etc...can also be used in learning by constructing an ensemble [43][44] [45].

In this research paper boosting method has been used to create the said ensemble. It works by rewarding successful classifiers and by applying penalties to unsuccessful classifiers. In the past it has been used in various applications like machine translation [46], intrusion detection [47], forest tree regression, natural language processing, unknown word recognition [48] etc.

Boosting is applied to varied types of classification problems. It is an iterative process wherein the training data set is regrouped together into subsets and various classifiers are used to classify data samples in the subsets. The data samples which were difficult to classify by a classifier also known as a weak learner at one stage are classified using new classifiers that get added to the ensemble at a later stage [49][50] [51]. In this way at each stage a new classifier gets augmented to the ensemble. The difficulty in classifying a data item Xi at stage k is represented by a weight factor Wk(i). The regrouping of training sets at each step of learning is done depending on the weight factor Wk(i) [22]. The value of the weight factor is proportional to the misclassification of the data. This way of forming regrouped data samples at every stage depending on the weight factor is called re-sampling version of boosting. Yet another way of implementing boosting is by reweighting wherein weight factor is assigned iteratively to every data item in the data set and the complete data set is used at every subsequent iteration by modifying the weights at every stage [48] [52].

The most popular boosting algorithm called Adaboost [23]. Adaboost stands for Adaptive Boosting. It adapts or updates weights of the data items based on misclassification of training samples due to weak learners and regroups the data subsets depending on the new weights. The steps of Adaboost algorithm is mentioned below:

-  end for ---------------------------------------------------In the next section the proposed hybrid methodology is discussed in detail.
------------------------------------------------- Adaboost Algorithm --------------------------------------------------

# III.


# Proposed Methodology

In this paper we shall construct a hybrid classification model which shall facilitate in predicting the class label of seed plant data from test data sets. The methodology recommended has been denoted as a schematic diagram as mentioned in Fig 3 and the detailed explanation of the steps followed has been given in the following subsections. The data sets are randomly divided into n different random subsets each subset comprising of two third of the whole data set. Classification methods are applied to each of these random subsets. The remaining one third data sets at each subsets is used as a test set. At each random subset the following attributes were used so as to implement the classification method discussed in the next subsection: id, continent, specificEpithet and churn. Now churn is a variable that is set to yes if the seed plant data belongs to North America or if it belongs to South America it is set to no.


# d) Selection of an appropriate classification method

In this paper seed plant data sets are classified using a hybrid classification method which makes use of Random forest, SVM classifier and boosting ensemble learning method. In the hybrid methodology the input data set is randomly subdivided into subsets. Each data item in each of the subset has a weight factor associated with it. The data items in the subsets are classified by SVM classifier. If a misclassification has occurred then the weight factor of the data items is increased otherwise it is reduced. The data subsets are rearranged and again SVM classifier is used to perform classification at each subset. The weights are again updated depending on whether it is a proper classification or a misclassification. These steps are iteratively repeated till all the weights get updated to a very low value. The output of the input data set is computed by applying voting mechanism to all the random subsets classification outputs [34]. The algorithm for the proposed hybrid methodology is givenin the sample code herein:

-----------------------------------------------------Algorithm 1 Hybrid classification using RF and SVM supplemented by boosting - --------------------------------------------------- The obtained classification output at each random subset is validated by using the hybrid classifier model to test against the complete data set.

In this paper 10 random feature subsets were used and at every subset SVM classifier was used to perform the said classification. Voting mechanism was then applied to derive the final classification output. In this paper a total of 180 support vectors were used.

IV.


# Performance Analysis a) Environment Setting

The study area included is from North and South America. It includes data pertaining to localities wherein seed plant species are present.

A total of 599 data set records from North American region and a total of 401data set records from South American region are analyzed in order to execute the proposed method. Sample records used in this paper are shown in Table I     It is observed that the most conventionally utilized evaluation metrics in classification are accuracy, specificity, positive predictive value and negative predictive value. The formulae for accuracy, specificity, prevalence and negative predictive value are provided by equations ( 8), ( 9), ( 10) and ( 11  The confusion matrix or error matrix view for SVM Classifier is given in Table V and for RF Classifier in Table VI. Performance Measures using evaluation metrics are specified in Fig 5 which are calculated using equations ( 8), ( 9), ( 10)and (11). 


# Conclusion

In this paper hybrid classifier based on random forest, SVM and boosting methods is used to classify seed plant data. The hybrid classification results are compared with the results attained by implementing classification using traditional SVM and RF classifiers. The research has established that the hybrid approach of classification is more efficient as compared to traditional SVM and RF classifiers since it gives higher values of accuracy, specificity, positive predictive value and negative predictive value.

The reason for better results in the case of hybrid classification methodology used in this paper is since it makes use of the advantages of each of the individual traditional SVM, RF classifications methods. Furthermore, the classification results are supplemented using boosting ensemble classification method. In the future the proposed method can be used so as to classify vector, raster remote sensed data that can be collected via satellites and various geographical information systems. 


# VI.
2![Figure 2 : Distance of the nearest data vectors from the Hyperplane The distance dof a data point x from the hyperplane is represented mathematically by the equation: d= |(m,x) + b| |m|](image-2.png "Figure 2 :")
![SVM selects a local Gaussian function and later the global Gaussian function is computed by aggregating all the local Gaussian function.SVM can be used to solve either two class or multi-class problems. Multiclass classification problems (](image-3.png "")
2![Classification using linear kernel based SVM Classifier -](image-4.png "- Algorithm 2")
![b) Overview of Random Forest Classifier Ensemble learning algorithms use an ensemble or a group of classifiers to classify data. Hence they give more accurate results as compared to individual classifiers. Random forest classifier is an example for ensemble classifiers. Random forests make use of an ensemble of classification trees [34][35][36][37][38] [41].](image-5.png "")
![learners also called weak learners if misclassification occurs Set example weights based on predictions by ensemble of classifiers.](image-6.png "")
3![Figure 3 : Proposed Model a) Selection of Remote sensed data The data sets collected in this paper belong to various types of seed plant family viz Pinopsida, Dicotyledoneae, Monocotyledoneae, Cycadopsida, Pinopsida, Gnetopsida, Lycopodiopsida, Agaricomycetes and Marchantiopsida. b) Data Inspection The seed plant data sets are pre-processed and any missing values or duplicate values are eliminated by either ignoring tuples comprising of duplicate values or by manually entering values or replacing with a global constant or a mean value into tuples with missing values[53]. c) Feature subset Selection](image-7.png "Figure 3 :")


1decimalLatitdecimalLongispecificEpichuidhigherGeographycontinent familyscientificNameudetudegenusthetrn2759NorthAmerica,NorthLycoperdacCalvati86GREENLANDAmericaeaeCalvatiaarctica72-40aarcticayes3333NorthEmpetrumeamesii FernaldEmpetr01North America,AmericaEricaceae&Wiegand52-56umeamesiiyes2717NorthRanunculaThalictrumterrae-novaeThalictr58North America,AmericaceaeGreene52-56umterrae-novaeyesA
2ItemCapacityCPUIntel CPU G645 @2.9 GHz processorMemory8GB RAMOSWindows 7 64-bitToolsR, R Studiob) Result AnalysisClassification of the spatial data sets can be represented as a confusion or error matrix view as shown in
III
3Real groupClassification resultNorth AmericaSouth AmericaNorth America True Negative(TN) False Positive(FP) South America False Negative(FN) True Positive(TP)
5PredictionReference South America North AmericaSouth America 87North America 4916
6PredictionReference South America North AmericaSouth America 3611North America 2112
			© 2014 Global Journals Inc. (US)
			© 2014 Global Journals Inc. (US)---------------------------------------------------
		
		
## Acknowledgment

We direct our frank appreciativeness to the Field Museum of Natural History(Botany)-Seed Plant Collection (accessed through GBIF data portal, http://data.gbif.org/datasets/resource/14346,2013-06-03) for providing us with different seed plant data sets. We also thank ANU university for providing all the support in the work conducted.

			
* 
	
		Random Forests for land cover classification
		
			JonAtliPall Oskar Gislason
		
		
			JohannesRBenediktsson
		
		
			Sveinsson
		
		10.1016/j.patrec.2005.08.011
		
	
		Pattern Recognition Letters
		0167-8655
		
			27
			4
			
			March 2006
		
	
* 
	
		An assessment of the effectiveness of a random forest classifier for land-cover classification
		
			VFRodriguez-Galiano
		
		
			BGhimire
		
		
			JRogan
		
		
			MChica-Olmo
		
		
			JPRigol-Sanchez
		
		10.1016/j.isprsjprs.2011.11.002
		
	
		ISPRS Journal of Photogrammetry and Remote Sensing
		
			67
			January 2012
		
	
	Pages 93-104, ISSN0924-2716


* 
	
		Multiple support vector machines for land cover change detection: An application for mapping urban extensions
		
			HassibaNemmour
		
		
			YoucefChibani
		
		10.1016/j.isprsjprs.2006.09.004
		
	
		ISPRS Journal of Photogrammetry and Remote Sensing
		
			61
			2
			November 2006
		
	
	Pages 125-133,ISSN0924-2716


* 
	
		Hybrid Bayesian network classifiers: Application to species distribution models
		
			PAAguilera
		
		
			AFernández
		
		
			FReche
		
		
			RRumí
		
		10.1016/j.envsoft.2010.04.016
		
	
		Environmental Modelling & Software, Volume25, Issue12
				
			December 2010
			
		
* 
	
		
			TillRumpf
		
		
			ChristophRömer
		
		
			MartinWeis
		
		
			MarkusSökefeld
		
		
			RolandGerhards
		
		
			LutzPlümer
		
		
	Sequential Pages89-96, ISSN0168-1699


* 
	
		XiuliSi, Fish species classification by color, texture and multi-class support vector machine using computer vision, Computers and Electronics in Agriculture
		
			JingHu
		
		
			DaoliangLi
		
		
			QinglingDuan
		
		
			YueqiHan
		
		
			GuifenChen
		
		10.1016/j.compag.2012.07.008
		
		
			October 2012
			88
		
	
	Pages 133-140,ISSN0168-1699


* 
	
		Classification of savanna tree species, in the Greater Kruger National Park region, by integrating hyperspectral and LiDAR data in a Random Forest data mining environment
		
			LNaidoo
		
		
			MACho
		
		
			RMathieu
		
		
			GAsner
		
		10.1016/j.isprsjprs.2012.03.005
		
	
		ISPRS Journal of Photogrammetry and Remote Sensing
		
			69
			
			April 2012
		
	
* 
	
		Support vector machine for multiclassification of mineral prospectivity areas
		
			MaysamAbedi
		
		
			Gholam-Hossain
		
		
			AbbasNorouzi
		
		
			Bahroudi
		
		10.1016/j.cageo.2011.12.014
		
	
		Computers & Geosciences
		0098-004
		
			46
			
			September2012
		
	
* 
	
		Effective scheduling strategies for boosting performance on rule-based spam filtering frameworks
		
			DRuano-Ordás
		
		
			JFdez-Glez
		
		
			FFdez-Riverola
		
		
			JRMéndez
		
		10.1016/j.jss.2013.07.036
		
	
		Journal of Systems and Software
		0164-1212
		
			Volume86,Issue12, December2013,Pages 3151-3161
		
	
* 
	
		
			N
		
		
			AbdulRahim
		
		
			MPPaulraj
		
		
			AHAdom
		
		10.1016/j.proeng.2013.02.054
		
		Adaptive Boosting with SVM Classifier for Moving Vehicle Classification, Procedia Engineering
				
			2013
			53
			
		
* 
	
		Oil spill feature selection and classification using decision tree forest on SAR image data
		
			KonstantinosTopouzelis
		
		
			ApostolosPsyllos
		
		10.1016/j.isprsjprs.2012.01.005
		
	
		ISPRS Journal of Photogrammetry and Remote Sensing
		0924-2716
		
			68
			
			March 2012
		
	
* 
	
		Comparison of support vector machine, neural network, and CART algorithms for the land-cover classification using limited training data points, ISPRS Journal of Photogrammetry and Remote Sensing, Volume70
		
			YangShao
		
		
			RossSLunetta
		
		10.1016/j.isprsjprs.2012.04.001
		
		
			June2012, Pages78-87,ISSN0924-2716
		
	
* 
	
		Support vector machine under uncertainty: An application for hydroacoustic classification of fish-schools in Chile
		
			PaulBosch
		
		
			JulioLópez
		
		
			HéctorRamírez
		
		
			HugoRobotham
		
		10.1016/j.eswa.2013.01.006
		
	
		Expert Systems with Applications
		
			40
			10
			August 2013
		
	
	Pages 4029-4034,ISSN0957-4174


* 
	
		Study on Recognition of Bird Species in Minjiang River Estuary Wetland
		
			HongjiLin
		
		
			HanLin
		
		
			WeibinChen
		
		10.1016/j.proenv.2011.09.386
		
	
		Procedia Environmental Sciences
		1878-0296
		
			
			2011
		
	
* 
	
		Predicting the potential habitat of oaks with data mining models andtheR system
		
			RafaelPino-Mejías
		
		
			MaríaDoloresCubiles-De-La-Vega
		
		
			MaríaAnaya-Romero
		
		
			AntonioPascual-Acosta
		
		
			AntonioJordán-López
		
		10.1016/j.envsoft.2010.01.004
		
	
		Environmental Modelling & Software
				
			July 2010
			25
			
		
			Nicolás Bellinfante-Crocci
		
	
* 
	
		Efficient Classification Algorithms using SVMs for Large Datasets, A Project Report Submitted in partial fulfillment of the requirements for the Degree of Master of Technology in Computational Science
		
			SNJeyanthi
		
		
			June 2007
			IISC, BANGALORE, INDIA
		
		
			Supercomputer Education and Research Center
		
	
* 
	
		Analysis of Parametric & Non Parametric Classifiers for Classification Technique using WEKA
		
			GYugal Kumar
		
		
			Sahoo
		
	
		I.J. Information Technology and Computer Science
		
			7
			
			2012
		
	
* 
	
		
		10.5815/ijitcs.2012.07.06
		
			July 2012
		
	
	Published Online


* 
	
		A hybrid SVM based decision tree, Pattern Recognition
		
			MKumar
		
		
			M
		
		10.1016/j.patcog.2010.06.010
		
		
			December 2010
			43
			
		
* 
	
		Magnetic resonance brain images classification using linear kernel based Support Vector Machine
		
			NRajasekhar
		
		
			SJBabu
		
		
			TVRajinikanth
		
	
		2012Nirma University International Conference on
				
			5
			68
		
	
	Engineering (NUiCONE)


* 
	
		
			Dec
		
		2012doi10.1109/NUICONE.2012.6493213
		
	
* 
	
		Incorporating Spatial Variability Measures in Land-cover Classification using Random Forest
		
			VFRodríguez-Galiano
		
		
			FAbarca-Hernández
		
		
			BGhimire
		
		
			MChica-Olmo
		
		
			PMAkinson
		
		
			CJeganathan
		
		10.1016/j.proenv.2011.02.009
		
	
		Procedia Environmental Sciences
		1878- 0296
		
			
			Volume3,2011
		
	
* 
	
		A hybrid network intrusion detection framework based on random forests and weighted k-means
		
			RedaMElbasiony
		
		
			ElsayedASallam
		
		
			TarekEEltobely
		
		
			MahmoudMFahmy
		
		10.1016/j.asej.2013.01003
		
	
		A in Shams Engineering Journal
		2090-4479
		
			Available online 7 March 2013
		
	
* 
	
		Using boosting to prune bagging ensembles
		
			GonzaloMartínez-Muñoz
		
		
			AlbertoSuárez
		
		10.1016/j.patrec.2006.06.018
		
	
		Pattern Recognition Letters
		
			28
			1 January 2007
		
	
	Pages 156-165, ISSN0167-8655


* 
	
		An efficient modified boosting method for solving classification problems
		
			Chun-XiaZhang
		
		
			Jiang-SheZhang
		
		
			Gai-YingZhang
		
		10.1016/j.cam.2007.03.003
		
	
		Journal of Computational and Applied Mathematics
		0377- 0427
		
			214
			2
			
			1 May 2008
		
	
* 
	
		Impact of feature selection on the accuracy and spatial uncertainty of per-field crop classification using Support Vector Machines, ISPRS Journal of Photogrammetry and Remote Sensing, Volume85
		
			FLöw
		
		
			UMichel
		
		
			SDech
		
		
			CConrad
		
		10.1016/j.isprsjprs.2013.08.007
		
		
			November 2013
			
		
* 
	
		Comparison of random forest, support vector machine and back propagation neural network for electronic tongue data classification: Application to the recognition of orange beverage and Chinese vinegar
		
			MingjunLiu
		
		
			JunWang
		
		
			DuoWang
		
		
			Li
		
		10.1016/j.snb.2012.11.071
		
	
		Sensors and Actuators B: Chemical
		0925-4005
		
			177
			
			February2013
		
	
* 
	
		Going-concern prediction using hybrid random forests and rough set approach
		
			Ching-ChiangYeh
		
		
			Der-JangChi
		
		
			Yi-RongLin
		
		10.1016/j.ins.2013.07.011
		
	
		Information Sciences
		0020-0255
		
			August 2013
		
	
* 
	
		A support vector machine approach to CMOS-based radar signal processing for vehicle classification and speed estimation
		
			Hsun-JungCho
		
		
			Ming-Tetseng
		
		10.1016/j.mcm.2012.11.003
		
	
		Mathematical and Computer Modelling
		
			58
			2
			July 2013
		
	
	Pages 438-448,ISSN0895-7177


* 
	
		Oil and gas pipeline failure prediction system using long range ultrasonic transducers and Euclidean-Support Vector Machines classification approach, Expert Systems with Applications, Volume40, Issue6
		
			RajprasadLam Hong Lee
		
		
			LaiHungRajkumar
		
		
			ChinLo
		
		
			DinoHeng Wan
		
		
			Isa
		
		10.1016/j.eswa.2012.10.006
		
		
			May 2013, Pages1925-1934
		
	
* 
	
		Support Vector Machines The Interface to lib svm in package e1071
		
			DavidMeyer
		
		
			September, 2012
			Austria
		
		
			Technische University of Wien
		
	
* 
	
		The one-against-all partition based binary tree support vector machine algorithms for multi-class classification
		
			XiaoweiYang
		
		
			QiaozhenYu
		
		
			LifangHe
		
		
			TengjiaoGuo
		
		10.1016/ j.neucom.2012.12.048
	
	
		Neurocomputing
		0925-2312
		
			113
			1-7
			3 August 2013
		
	
* 
	
		Support Vector Machines for Classification and Regression
		
			SteveRGunn
		
		
			May1998
		
		
			Faculty of Engineering, Science and Mathematics School of Electronics and Computer Science, University Of South Hampton
		
	
	Technical Report


* 
	
		Convex and concave hulls for classification with support vector machine, Neuro computing
		
			Asdrúbal LópezChau
		
		
			XiaoouLi
		
		
			WenYu
		
		
			122
			25
		
	
* 
	
		10.1016/j.neucom.2013.05.040
		
		Pages 198-209,I SSN 0925-2312
				
			December 2013
		
	
* 
	
		Structural twin parametric-margin support vector machine for binary classification, Knowledge-Based Systems
		
			XinjunPeng
		
		
			YifeiWang
		
		
			DongXu
		
		10.1016/j.knosys.2013.04.013
		
		
			September2013
			49
			
		
* 
	
		Churn prediction in telecom using Random Forest and PSO based data balancing in combination with various feature selection strategies
		
			AdnanIdris
		
		
			MuhammadRizwan
		
		
			AsifullahKhan
		
		10.1016/j.compeleceng.2012.09.001
		
	
		Pages 1808-1819, ISSN0045-7906
				
			Novem ber 2012
			38
		
	
* 
	
		Forest classification trees and forest support vector machines algorithms: Demonstration using microarray data
		
			EliasZintzaras
		
		
			AxelKowald
		
		10.1016/j.compbiomed.2010.03.006
		
	
		Computers in Biology and Medicine
		
	
	Pages 519-524, ISSN0010-4825


* 
	
		A hybrid KMV model, random forests and rough set theory approach for credit rating, Knowledge-Based Systems
		
			Ching-HiangYeh
		
		
			Chih-YuFengyilin
		
		
			Hsu
		
		10.1016/j.knosys.2012.04.004
		
		
			September 2012
			33
		
	
	Pages 166-172, ISSN0950-7051


* 
	
		imageRF -A user-oriented implementation for remote sensing image analysis with Random Forests
		
			SebastianBjörnwaske
		
		
			CarstenVander Linden
		
		
			BenjaminOldenburg
		
		
			Jakimow
		
		10.1016/j.envsoft.2012.01.014
		
	
		Environmental Modeling & Software
				Andreas Rabe, Patrick Hostert
		
			July 2012, Pages192-193
			35
		
	
* 
	
		Comparison of random forest, support vector machine and back propagation neural network for electronic tongue data classification: Application to the recognition of orange beverage and Chinese vinegar
		
			MiaoLiu
		
		
			MingjunWang
		
		
			DuoJunwang
		
		
			Li
		
		10.1016/j.snb.2012.11.071
		
	
		Pages 970-980, ISSN0925-4005
				
			February 2013
			177
		
	
* 
	
		Variable selection using r random forests
		
			RobinGenuer
		
		
			Jean-MichelPoggi
		
		
			ChristineTuleau-Malot
		
		10.1016/j.patrec.2010.03.014
		
	
		Pattern Recognition Letters
		
			31
			14
			15 October 2010
		
	
	Pages 2225-2236, ISSN0167-8655


* 
	
		Alexander Hammers, Daniel Rueckert, for the Alzheimer's Disease Neuroimaging Initiative, Random forest-based similarity measures for multimodal classification of Alzheimer's disease, NeuroImage
		
			KatherineRGray
		
		
			PaulAljabar
		
		
			RolfAHeckemann
		
		10.1016/j.neuroimage.2012.09.065
		
	
		Pages 167-175, ISSN10538119
				
			15 January 2013
			65
		
	
* 
	
		Sleep stages classification based on heart rate variability and random forest
		
			MengXiao
		
		
			HongYan
		
		
			JinzhongSong
		
		
			YuzhouYang
		
		
			XianglinYang
		
		10.1016/j.bspc.2013.06.001
		
	
		Biomedical Signal Processing and Control
		
			8
			November 2013
		
	
	Pages 624-633, ISSN1746-8094


* 
	
		Random forests ensemble classifier trained with data resampling strategy to improve cardiac arrhythmia diagnosis
		
			AkinÖzçift
		
		10.1016/j.compbiomed.2011.03.001
		
	
		Computers in Biology and Medicine
		
			41
			5
			May 2011
		
	
	Pages 265-271 ,ISSN0010-4825


* 
	
		Improving nonparametric regression methods by bagging and boosting
		
			SimoneBorra
		
		
			AgostinoDi Ciaccio
		
		10.1016/S0167-9473(01
		
	
		Computational Statistics &Data Analysis
		
			
			Volume38, Issue4, 28February2002. Pages 407-420,ISSN01679473
		
	
* 
	
		Boosting and combination of classifiers for natural language call routing systems
		
			ImedZitouni
		
		
			Hong-Kwang JeffKuo
		
		
			Chin-HuiLee
		
		10.1016/0167-6393(03
		
	
		Speech Comm unication
		
			41
			4
			
			November 2003
		
	
	Pages 647-661, ISSN0167-6393


* 
	
		Boosting for Multiclass Semi-Supervised Learning
		
			JafarTanha
		
		
			MaartenvanSomeren
		
		
			HamidehAfsarmanesh
		
		10.1016/j.patrec.2013.10.008
		
	
		Pattern Recognition Letters
		
			
			21 October 2013
		
	
* 
	
		Bagging and Boosting statistical machine translation systems
		
			TongXiao
		
		
			JingboZhu
		
		
			TongranLiu
		
		10.1016/j.artint.2012.11.005
		
	
		Artificial Intelligence
		
			195
			February 2013
		
	
	Pages 496-527, ISSN0004-3702


* 
	
		Intrusion detection by integrating boosting genetic fuzzy classifier and data mining criteria for rule prescreening
		
			TanselÖzyer
		
		
			RedaAlhajj
		
		
			KenBarker
		
		10.1016/j.jnca.n2005.06.002
		
	
		Journal of Network and Computer Applications
		
			30
			1
			January 2007
		
	
	Pages 99-113, ISSN1084-8045


* 
	
		Boosting-based ensemble learning with penalty profiles for automatic Thai unknown word recognition
		
			JakkritTecho
		
		
			CholwichNattee
		
		
			ThanarukTheeramunkong
		
		10.1016/j.camwa.2011.11.062
		
	
		Computers &Mathematics with Applications
		
			63
			6
			March 2012, Pages 1117-1134, ISSN08981221
		
	
* 
	
		A local boosting algorithm for solving classification problems
		
			Chun-XiaZhang
		
		
			Jiang-SheZhang
		
		10.1016/j.csda.2007.06.015
		
	
		Computational Statistics & Data Analysis
		0167-9473
		
			52
			4
			10 January 2008. 1928-1941
		
	
* 
	
		Plot-level Forest Volume Estimation Using Airborne Laser Scanner and TM Data, Comparison of Boosting and Random Forest Tree Regression Algorithms
		
			ShabanShataeea
		
		
			HolgerWeinaker
		
		
			ManoucherBabanejad
		
		10.1016/j.proenv.2011.07.013
		
	
		Procedia Environmental Sciences
		1878- 0296
		
			7
			
			2011
		
	
* 
	
		QBoost: Predicting quantiles with boosting for regression and binary classification
		
			ZhengSong Feng
		
		10.1016/j.eswa.2011.06.060
		
	
		Expert Systems with Applications
		0957- 4174
		
			39
			2
			
			1 February 2012
		
	
* 
	
		An experimental study on diversity for bagging and boosting with linear classifiers, Information Fusion
		
			LIKuncheva
		
		
			MSkurichina
		
		
			RP WDuin
		
		10.1016/S1566-2535(02
		
		
			December 2002
			3
			
		
* 
	
		A survey of image classification methods and techniques for improving classification performance
		
			DLu
		
		
			& QWeng
		
		10.1080/01431160600746456
		
	
		International Journal of Remote Sensing
		
			28
			5
			
			2007
		
	
* 
	
		
			Team}{r Core
		
		
		R:A Language and Environment for Statistical Computing, R Foundation for Statistical Computing
				Vienna,Austria
		
			2013