# Introduction

ata mining is the process of extracting useful information from various data repositories wherein data might be present in different formats in heterogeneous environments [1] [2]. Various methods like classification, association, clustering, regression, characterization, outlier analysis can be used to mine the necessary information. In this paper we shall be focusing on classification.

Classification is the process wherein a class label is assigned to unlabeled data vectors. Clas-sification can be further categorized as supervised and unsupervised classification. In supervised classify-cation the class labels or categories into which the data sets need to be classified into is known in advance. In unsu-pervised classification the class label is not known in advance [3]. Unsupervised classification is also known as clustering. Supervised classification can be subdivided into non-parametric and parametric classification. Parametric classifier method is dependent on the pro-bability distribution of each class. Non parametric cla-ssifiers are used when the density function is not known [4].

One of the very prominent parametric supervised classification methods is support vector machines(SVM).

In this paper SVM are used to perform the said classification. Herein the data vectors are represented in a feature space. Later a hyperplane that geometrically resembles a slope line is constructed in the feature space which divides the space comprising of data vectors into two regions such that the data items get classified under two different class labels corresponding to the two differrent regions [5]. It helps in solving equally two class and multi class classification problem [6] [7]. The aim of the said hyper plane is to maximize its distance from the adjoining data points in the two regions. Moreover, SVM's do not have an additional overhead of feature extraction since it is part of its own architecture. Latest research has proved that SVM classifiers provide better classification results when one uses spatial data sets as compared to other classification algorithms like Bayesian method, neural networks and k-nearest neighbors classification methods [8][9].

SVM have been used to classify data in various domains like land cover classification [10], species distribution [11], medical binary classification[9], fault diag-nosis [12], character classification [5], speech recognition [13], radar signal processing [14], habitat prediction etc... In this paper SVM is used to classify remote sensed data sets. Two formats of remote sensed data viz. raster format and comma separated value(CSV) file formats have been used for performing the said classification using SVM.

Our next section describes Background Knowledge about SVM classifiers. In section 3 materials and methods viz. data acquired and the proposed methodology have been discussed. Performance analysis is discussed in Section 4. Section 5 concludes this work and later acknowledgement is given to the data source followed by references.  The line mentioned herein is called a hyperplane and can be mathematically represented by equation ( 1) [21]:


# II.


# Background Knowledge
mx i + b >= +1 mx i + b <= -1(1)
The data points can be represented by equation ( 2) [22]:
f(x)= sgn(mx+ b) (2)
where sgn() is known as a sign function, which is mathematically represented by the following equation:
sgn(x)=? 1 if x > 0 0 if x = 0 ?1 if x < 0 (3)
There can be many hyperplanes which can divide the data space into two regions but the one that increases the distance amid the bordering data points in the input data space is the result to the two class problem. The adjoining data points close to this hype-rplane are called support vectors. This concept can be illutrated geometrically as in Figure 2. This maximization problem viz. maximizing the distance between the hyperplane and the adjoining support vectors can be represented as a Quadratic Optimization Problem as in equation( 5) [22][23]:
h(m)= 1 2 m t m (5)
subject to y i (mx i + b) >=1,?i The solution for this problem can be provide by a Lagrange multiplier ? i which is associated with every constraint in the main problem. The solution can be represented as: m=? ??i??i??i b=y k -m t x k for any x k such that Lagrange multiplier ? k #0 (6) The classifier can be denoted as [16]:
f(x)= ? ??i??i??i ?? + ??(7)
In the case of non-linear SVM's the input data space can be generalized onto a higher dimensional feature space as illustrated in Fig 3. If every data point in the input data space is generalized onto a higher dimensional feature space which can be represented as [18]:
K(x i ,x j )=f(x i ) t. f(x j )(8)
This is also called a kernel function. It is computed using an inner dot product in the feature space. Various kernel functions can be used to do the said mapping as mentioned in the below equations [23]: Linear Kernel function = x i t x j Polynomial kernel function = (1 + x i t x j ) p Gaussian radial based kernel function = exp(-
|???? ????? | 2 2?? 2 )
Sigmoid kernel function= tanh(? 0 x i x j +? 1 ) (9)

One of the major advantages of SVM is that feature selection is automatically taken care by it and one need not separately derive features.


# III.


# Materials and Methods


# a) Data Acquisition

In this paper SVM classification methodology is applied on two different data set formats. The first format of data sets used is a comma separated value(CSV) file which shall have all relevant attributes necessary for the said classification separated by comma. The data sets used in this category is taken from the birds species occurrences of North-east India [24]. The second format of data sets for classification is in raster format [25]. Raster image is a collection of pixels represented in a matrix form. Raster images can be stored in varying formats. The raster format used herein is TIFF format. A map of Andhra Pradesh state in India used.


# b) Proposed Method

The data under consideration is first preprocessed. [26]. In the case of csv datasets comprising of information of birds of North-east India the attributes considered are id, family, genus, specific_ epit-het, latitude, longitude, ver-b-atim _scientific_ name, ve-rba-tim_ family, verbatim_ genus, verbatim_ specific_ ep-ithet and locality. A variable called churn acts as a class label which would categorize the data into two cate-goriesviz onehaving data sets of birds from Darjeeling area and the other having data sets of birds belonging to other north eastern parts in India. Before applying the clas-sification the data sets are cleaned to remove any mis-sing values. In the case of raster data set, a TIFF image is used. The image comprises of a map of Andhra Pradesh, a state in India. Initially a region of interest(ROI) is captured and later supervised SVM classification methodology is applied. Algorithm that explains implementation of SVM is given below [27]: Begin

Step 1: Loop the n data items

Step 2: Start dividing the input data set into two sets of data corresponding to two different categories

Step 3: If a data item is not assigned any of the regions mentioned then add it to set of support vectors V

Step 4: end loop


# End


# IV. erformance nalysis a) Environment Setting

A total of 695 data set records act as test data set and are used to authenticate the classification results obtained for CSV data sets and in the case of TIFF raster data sets one Region of interest is extracted from the given input image. The proposed method has been implemented under the environment setting as shown in Classification accuracy can be measured using parameters of a confusion or error matrix view depending on whether the event is correctly classified or no event is correctly classified as shown in Table 2[9]. And the classified results for CSV format data sets is demonstrated in Figure 4.    10), ( 11), ( 12) and ( 13 Kappa statistics=Sensitivity + Specificity -1 (13) The efficiency of the proposed SVM classifier is evaluated using the said parameters. The confusion or error matrix view for SVM classifier while classifying the CSV data sets is given in Table 3. The confusion matrix or error matrix view for SVM Classifier while classifying raster TIFF data sets is given in Table 4. Performance Measures using evaluation metrics are specified in Table 5 which are calculated using equations ( 10), ( 11), ( 12)and (13). 


# V. Conclusion

In this paper SVM classification method is used to build a classification model for two datasets. The first data set is of CSV format and the second one is a raster TIFF image. Later the classification model is validated against a test data set which is a subset of the input dataset. The performance of SVM is calculated using kappa statistics and accuracy parameters and it is established that for the given data sets SVM classifies the raster image dataset with better accuracy than the CSV dataset. The SVM classification methodology discussed herein can help in environment monitoring, land use, mineral resource identification, classification of remote sensed data into roads and land etc.. in the future. 
![a) Overview of SVM Classifier Support vector machine (SVM) is a powerful tool used in solving either two class or multi class classification problem[15][16]. In a two class problem the input data has to be separated into two different categories wherein each category is assigned a unique class label[17]. A multi class classification problem can be solved by dividing it into multiple two class class-categorized into non-linear and linear SVM. Data can be represented in space as shown in Fig 1.Linear SVM can be geometrically represented by a line which divides the data space into two different regions thus resulting in classifying the said data which can be assigned two class labels corresponding to the two regions[18][19][20].](image-2.png "")
1![Figure 1 : The Hyperplane](image-3.png "Figure 1 :")
22![Figure 2 : Distance of the nearest data vectors from the Hyperplane](image-4.png "Figure 2 : 2 |??|")
3![Figure 3 : (a) Input space (b) Higher dimensional feature space](image-5.png "Figure 3 :")
4![Figure 4 : (a) Birds data belonging to Darjeeling area from input dataset in black color(b) Birds data belonging to parts other tan Darjeeling area from input dataset marked in blue color The region of interest for the raster data set and the classified image is shown in Fig 5.](image-6.png "Figure 4 :")
5![(a) Region of Interest from the input raster data set. (b) Classified image with Andhra Pradesh land represented with green and water represented with light blue color. In this paper the parameters used to evaluate the classification is Accuracy and kappa statistics. The formulae for accuracy, specificity, sensitivity and kappa statistics are provided by equations (](image-7.png "Figure 5 :")
![/data.gbif.org/ and image data accessed through http://maptell.com for providing us with CSV and raster image data sets. We also thank ANU university for providing all the support in the work conducted.](image-8.png "")
1Table1 : Environment SettingItemCapacityCPUIntel CPU G645 @2.9 GHz processorMemory 8GB RAMOSWindows 7 64-bitToolsR, R Studio, Monteverdi toolb) Result Analysis
2Real groupClassification resultNo EventEventNo EventTrue Negative(TN) False Positive(FP)EventFalse Negative(FN) True Positive(TP)
3PredictionReference Other parts DarjeelingOther parts 5711Darjeeling7116
4PredictionReference Land WaterLand780Water056
5datasetsData set typeAccuracyKappa StatisticsCSV data sets98.8595.97Raster TIFF data sets 100100
			© 2014 Global Journals Inc. (US)
			© 2014 Global Journals Inc. (US) Supervised Classification of Remote Sensed data Using Support Vector Machine
		
		
## Acknowledgment

We direct our frank appreciativeness to CSV data which was accessed via GBIF data portal,

			
* 
	
		The application of data mining techniques in financial fraud detection: A classification framework and an academic review of literature
		
			EW TNgai
		
		
			YongHu
		
		
			YHWong
		
		
			YijunChen
		
		
			XinSun
		
		10.1016/j.dss.2010.08.006
		
	
		Decision Support Systems
		
			50
			3
			February 2011,Pages559569,ISSN01679236
		
	
* 
	
		Graphical Representation and Exploratory Visualization for Decision Trees in the KDD Process
		
			AWilson
		
		
			ClaudioJ MenesesCastillo Rojas
		
		
			Villegas
		
		10.1016/j.sbspro.2013.02.033
		
	
		Procedia -Social and Behavioral Sciences
		1877- 0428
		
			73
			
			27 February 2013
		
	
* 
	
		Learning Bayesian classifiers from positive and unlabeled examples
		
			BorjaCalvo
		
		
			PedroLarrañaga
		
		
			JoséALozano
		
		10.1016/j.patrec.2007.08.003
		
	
		Pages 23752384,ISSN01678655
				
			1 December 2007
			28
		
	
* 
	
		Analysisof Parametric & Non Parametric Classifiers for Classification Technique using WEKA
		
			GYugal Kumar
		
		
			Sahoo
		
		10.5815/ijitcs.2012.07.06a
	
	
		I.J. Information Technology and Computer Science
		
			7
			
			2012. July 2012
		
	
	Published Online


* 
	
		Eyas El-Qawasmeh, Performance of KNN and SVM classifiers on full word Arabic articles
		
			IsmailHmeidi
		
		
			BilalHawashin
		
		10.1016/j.aei.2007.12.001
	
	
		Advanced Engineering Informatics
		
			22
			1
			January 2008, Pages106111,ISSN14740346
		
	
* 
	
		Fish species classification by color, texture and multi-class support vector machine using computer vision, Computers and Electronics in Agriculture
		
			JingHu
		
		
			DaoliangLi
		
		
			YueqiQinglingduan
		
		
			GuifenHan
		
		
			XiuliChen
		
		
			Si
		
		ISSN01681699,10.1016/j.compag.2012.07.008
		
			October 2012. Pages133140
			88
		
	
* 
	
		The one-against-all partition based binary tree support vector machine algorithms for multi-class classification, Neuro computing
		
			XiaoweiYang
		
		
			QiaozhenYu
		
		
			LifangHe
		
		
			TengjiaoGuo
		
		
			113
			3
		
	
* 
	
		Comparison of support vector machine, neural network, and CART algorithms for the land-cover classification using limited training data points
		
			YangShao
		
		
			RossSLunetta
		
		10.1016/j.isprsjprs.2012.04.001
	
	
		ISPRS Journal of Photogrammetry and Remote Sensing
		
			70
			June2012,Pages7887,ISSN0924716
		
	
* 
	
		Magnetic resonance brain images classification using linear kernel based Support Vector Machine
		
			NRajasekhar
		
		
			SJBabu
		
		
			TVRajinikanth
		
		10.1109/NUICONE.2012.6493213
	
	
		Nirma University International Conference on
				
			2012. 6-8 Dec 2012
			5
		
	
	Engineering (NUiCONE)


* 
	
		Comparison of support vector machine, neural network, and CART algorithms for the land-cover classification using limited training data points
		
			YangShao
		
		
			RossSLunetta
		
		10.1016/j.isprsjprs.2012.04.001
		
	
		ISPRS Journal of Photogrammetry and Remote Sensing
		0924-2716
		
			70
			
			June 2012
		
	
* 
	
		Study on Recognition of Bird Species in Minjiang River Estuary Wetland
		
			HongjiLin
		
		
			HanLin
		
		
			WeibinChen
		
		10.1016/j.proenv.2011.09.386
		
	
		Procedia Environmental Sciences
		1878- 0296
		
			10
			
			2011
		
	
* 
	
		Fault diagnosis for a wind turbine transmission system based on manifold learning and Shannon wavelet support vector machine, Renewable Energy
		
			BaopingTang
		
		
			TaoSong
		
		
			FengLi
		
		
			LeiDeng
		
		10.1016/j.renene.2013.06.025
		
		
			February 2014
			62
		
	
* 
	
		Class-specific GMM based intermediate matching kernel for classification of varying length patterns of long duration speech using support vector machines
		
			ADDileep
		
		
			CChandraSekhar
		
		10.1016/j.specom.2013.09.010
		
	
		Speech Communication
		
			57
			February 2014, Pages126143,ISSN01676393
		
	
* 
	
		A support vector machine approach to CMOS-based radar signal processing for vehicle classification and speed estimation, Mathematical and Computer Modeling
		
			Hsun-JungCho
		
		
			Ming-Tetseng
		
		10.1016/j.mcm.2012.11.003
		
		
			Issues 1-2, July 2013
			58
			
		
* 
	
		Oil and gas pipeline failure prediction system using long range ultrasonic transducers and Euclidean-Support Vector Machines classification approach, Expert Systems with Applications
		
			RajprasadLam Hong Lee
		
		
			LaiHungRajkumar
		
		
			ChinLo
		
		
			DinoHeng Wan
		
		
			Isa
		
		10.1016/j.eswa.2012.10.006
		
		
			May 2013. 1925-1934
			40
		
	
* 
	
		Forest classification trees and forest support vector machines algorithms: Demonstration using microarray data
		
			EliasZintzaras
		
		
			AxelKowald
		
		10.1016/j.compbiomed.2010.03.006
		
	
		Computers in Biology and Medicine
		
			Volume40,Issue5, May2010, Pages519524,ISSN00104825
		
	
* 
	
		Structural twin parametric-margin support vector machine for binary classification, Knowledge-Based Systems
		
			XinjunPeng
		
		
			YifeiWang
		
		
			DongXu
		
		10.1016/j.knosys.2013.04.013
		
		
			September2013,Pages6372,ISSN09507051
			49
		
	
* 
	
		Support vector machine for multiclassification of mineral prospectivity areas
		
			MaysamAbedi
		
		
			Gholam-Hossain
		
		
			AbbasNorouzi
		
		
			Bahroudi
		
		10.1016/j.cageo.2011.12.014
		
	
		Computers & Geosciences
		
			46
			September 2012,Pages272283,SSN0098004
		
	
* 
	
		Comparison of random forest, support vector machine and back propagation neural network for electronic tongue data classification: Application to the recognition of orange beverage and Chinese vinegar
		
			MingjunLiu
		
		
			JunWang
		
		
			DuoWang
		
		
			Li
		
		10.1016/j.snb.2012.11.071
		
	
		Sensors and Actuators B: Chemical
		
			177
			February 2013, Pages970980,ISSN09254005
		
	
* 
	
		Impact of feature selection on the accuracy and spatial uncertainty of per-field crop classification using Support Vector Machines, ISPRS Journal of Photogrammetry and Remote Sensing, Volu-me85
		
			FLöw
		
		
			UMichel
		
		
			SDech
		
		
			CConrad
		
		10.1016/j.isprsjprs.2013.08.007
		
		
			November2013,Pages102119,ISSN09242716
		
	
* 
	
		Support Vector Machines for Classification and Regression
		
			SteveRGunn
		
		
			May1998
		
		
			Faculty of Engineering, Science and Mathematics School of Electronics and Computer Science, University Of South Hampton
		
	
	Technical Report


* 
	
		Sequential support vector machine classification for small-grain weed species discrimination with special regard to Cirsiumarvense and Galiumaparine, Computers and Electronics in Agriculture
		
			TillRumpf
		
		
			ChristophRömer
		
		
			MartinWeis
		
		
			MarkusSökefeld
		
		
			RolandGerhards
		
		
			LutzPlümer
		
		10.1016/j.compag.2011.10.018
		
		
			January 2012, Pages89-96,ISSN01681699
			80
		
	
* 
	
		Multiple support vector machines for land cover change detection: An application for mapping urban extensions
		
			HassibaNemmour
		
		
			YoucefChibani
		
		10.1016/j.isprsjprs.2006.09.004
		
	
		Pages 125-133,ISSN09242716
				
			November 2006
			61
		
	
* 
	
		Collaborative work of Environmental Information System (ENVIS) Centre and Important Bird Areas Programmes-Indian Bird Conservation Network (IBA-IBCN) projects of the BNHS
		
			MohitSujit Narwade
		
		
			RajkumarKalra
		
		
			DivyaJagdish
		
		
			SagarVarier
		
		
			GautamSatpute
		
		
			STalukdar ; Narwade
		
		
			2011. 2011
			India, Zookeys
		
	
	Literature based species o-ccurrencedataofbirdsofNortheastIndia


* 
	
		A survey of image classification methods and techniques for improving classification performance
		
			DLu
		
		
			& QWeng
		
		10.1080/01431160600746456
		
	
		International Journal of Remote Sensing
		
			28
			5
			
			2007
		
	
* 
	
		Efficient Classification Algorithms using SVMs for Large Datasets, A Project Report Submitted in partial fulfillment of the requirements for the Degree of Master of Technology in Computational Science
		
			SNJeyanthi
		
		
			June 2007
			IISC, BANGALORE, INDIA
		
		
			Supercomputer Education and Research Center
		
	
* 
	
		
			Team}R Core
		
		
		R:A Language and Environment for Statistical Computing, R Foundation for Statistical Computing
				Vienna,Austria
		
			2013
		
	
* 
	
		Near-miss narratives from the fire service: A Bayesian analysis
		
			JenniferATaylor
		
		
			AliciaVLacovara
		
		
			GordonSSmith
		
		
			RaviPandian
		
		
			MarkLehto
		
		10.1016/j.aap.2013.09.012
		
	
		Accident Analysis & Prevention
		
			62
			January 2014,Pages119129,ISSN00014575
		
	
* 
	
		Using global maps to predict the risk of dengue in Europe
		
			DavidJRogers
		
		
			JonathanESuk
		
		
			JanCSemenza
		
		10.1016/j.actatropica.2013.08.008
		
	
		Acta Tropica
		0001-706X
		
			129
			
			January 2014
		
	
* 
	
		Predicting the potential habitat of oaks with data mining models and the R system
		
			RafaelPino-Mejías
		
		
			MaríaDoloresCubiles-De-La-Vega
		
		
			MaríaAnaya-Romero
		
		
			AntonioPascual-Acosta
		
		
			AntonioJordán-López
		
		10.1016/j.envsoft.2010.01.004
		
	
		Environmental Modelling & Software
				
			July 2010
			25
			
		
			Nicolás Bellinfante-Crocci