# Introduction ata mining is the process of extracting useful information from various data repositories wherein data might be present in different formats in heterogeneous environments [1] [2]. Various methods like classification, association, clustering, regression, characterization, outlier analysis can be used to mine the necessary information. In this paper we shall be focusing on classification. Classification is the process wherein a class label is assigned to unlabeled data vectors. Clas-sification can be further categorized as supervised and unsupervised classification. In supervised classify-cation the class labels or categories into which the data sets need to be classified into is known in advance. In unsu-pervised classification the class label is not known in advance [3]. Unsupervised classification is also known as clustering. Supervised classification can be subdivided into non-parametric and parametric classification. Parametric classifier method is dependent on the pro-bability distribution of each class. Non parametric cla-ssifiers are used when the density function is not known [4]. One of the very prominent parametric supervised classification methods is support vector machines(SVM). In this paper SVM are used to perform the said classification. Herein the data vectors are represented in a feature space. Later a hyperplane that geometrically resembles a slope line is constructed in the feature space which divides the space comprising of data vectors into two regions such that the data items get classified under two different class labels corresponding to the two differrent regions [5]. It helps in solving equally two class and multi class classification problem [6] [7]. The aim of the said hyper plane is to maximize its distance from the adjoining data points in the two regions. Moreover, SVM's do not have an additional overhead of feature extraction since it is part of its own architecture. Latest research has proved that SVM classifiers provide better classification results when one uses spatial data sets as compared to other classification algorithms like Bayesian method, neural networks and k-nearest neighbors classification methods [8][9]. SVM have been used to classify data in various domains like land cover classification [10], species distribution [11], medical binary classification[9], fault diag-nosis [12], character classification [5], speech recognition [13], radar signal processing [14], habitat prediction etc... In this paper SVM is used to classify remote sensed data sets. Two formats of remote sensed data viz. raster format and comma separated value(CSV) file formats have been used for performing the said classification using SVM. Our next section describes Background Knowledge about SVM classifiers. In section 3 materials and methods viz. data acquired and the proposed methodology have been discussed. Performance analysis is discussed in Section 4. Section 5 concludes this work and later acknowledgement is given to the data source followed by references. The line mentioned herein is called a hyperplane and can be mathematically represented by equation ( 1) [21]: # II. # Background Knowledge mx i + b >= +1 mx i + b <= -1(1) The data points can be represented by equation ( 2) [22]: f(x)= sgn(mx+ b) (2) where sgn() is known as a sign function, which is mathematically represented by the following equation: sgn(x)=? 1 if x > 0 0 if x = 0 ?1 if x < 0 (3) There can be many hyperplanes which can divide the data space into two regions but the one that increases the distance amid the bordering data points in the input data space is the result to the two class problem. The adjoining data points close to this hype-rplane are called support vectors. This concept can be illutrated geometrically as in Figure 2. This maximization problem viz. maximizing the distance between the hyperplane and the adjoining support vectors can be represented as a Quadratic Optimization Problem as in equation( 5) [22][23]: h(m)= 1 2 m t m (5) subject to y i (mx i + b) >=1,?i The solution for this problem can be provide by a Lagrange multiplier ? i which is associated with every constraint in the main problem. The solution can be represented as: m=? ??i??i??i b=y k -m t x k for any x k such that Lagrange multiplier ? k #0 (6) The classifier can be denoted as [16]: f(x)= ? ??i??i??i ?? + ??(7) In the case of non-linear SVM's the input data space can be generalized onto a higher dimensional feature space as illustrated in Fig 3. If every data point in the input data space is generalized onto a higher dimensional feature space which can be represented as [18]: K(x i ,x j )=f(x i ) t. f(x j )(8) This is also called a kernel function. It is computed using an inner dot product in the feature space. Various kernel functions can be used to do the said mapping as mentioned in the below equations [23]: Linear Kernel function = x i t x j Polynomial kernel function = (1 + x i t x j ) p Gaussian radial based kernel function = exp(- |???? ????? | 2 2?? 2 ) Sigmoid kernel function= tanh(? 0 x i x j +? 1 ) (9) One of the major advantages of SVM is that feature selection is automatically taken care by it and one need not separately derive features. # III. # Materials and Methods # a) Data Acquisition In this paper SVM classification methodology is applied on two different data set formats. The first format of data sets used is a comma separated value(CSV) file which shall have all relevant attributes necessary for the said classification separated by comma. The data sets used in this category is taken from the birds species occurrences of North-east India [24]. The second format of data sets for classification is in raster format [25]. Raster image is a collection of pixels represented in a matrix form. Raster images can be stored in varying formats. The raster format used herein is TIFF format. A map of Andhra Pradesh state in India used. # b) Proposed Method The data under consideration is first preprocessed. [26]. In the case of csv datasets comprising of information of birds of North-east India the attributes considered are id, family, genus, specific_ epit-het, latitude, longitude, ver-b-atim _scientific_ name, ve-rba-tim_ family, verbatim_ genus, verbatim_ specific_ ep-ithet and locality. A variable called churn acts as a class label which would categorize the data into two cate-goriesviz onehaving data sets of birds from Darjeeling area and the other having data sets of birds belonging to other north eastern parts in India. Before applying the clas-sification the data sets are cleaned to remove any mis-sing values. In the case of raster data set, a TIFF image is used. The image comprises of a map of Andhra Pradesh, a state in India. Initially a region of interest(ROI) is captured and later supervised SVM classification methodology is applied. Algorithm that explains implementation of SVM is given below [27]: Begin Step 1: Loop the n data items Step 2: Start dividing the input data set into two sets of data corresponding to two different categories Step 3: If a data item is not assigned any of the regions mentioned then add it to set of support vectors V Step 4: end loop # End # IV. erformance nalysis a) Environment Setting A total of 695 data set records act as test data set and are used to authenticate the classification results obtained for CSV data sets and in the case of TIFF raster data sets one Region of interest is extracted from the given input image. The proposed method has been implemented under the environment setting as shown in Classification accuracy can be measured using parameters of a confusion or error matrix view depending on whether the event is correctly classified or no event is correctly classified as shown in Table 2[9]. And the classified results for CSV format data sets is demonstrated in Figure 4. 10), ( 11), ( 12) and ( 13 Kappa statistics=Sensitivity + Specificity -1 (13) The efficiency of the proposed SVM classifier is evaluated using the said parameters. The confusion or error matrix view for SVM classifier while classifying the CSV data sets is given in Table 3. The confusion matrix or error matrix view for SVM Classifier while classifying raster TIFF data sets is given in Table 4. Performance Measures using evaluation metrics are specified in Table 5 which are calculated using equations ( 10), ( 11), ( 12)and (13). # V. Conclusion In this paper SVM classification method is used to build a classification model for two datasets. The first data set is of CSV format and the second one is a raster TIFF image. Later the classification model is validated against a test data set which is a subset of the input dataset. The performance of SVM is calculated using kappa statistics and accuracy parameters and it is established that for the given data sets SVM classifies the raster image dataset with better accuracy than the CSV dataset. The SVM classification methodology discussed herein can help in environment monitoring, land use, mineral resource identification, classification of remote sensed data into roads and land etc.. in the future. ![a) Overview of SVM Classifier Support vector machine (SVM) is a powerful tool used in solving either two class or multi class classification problem[15][16]. In a two class problem the input data has to be separated into two different categories wherein each category is assigned a unique class label[17]. A multi class classification problem can be solved by dividing it into multiple two class class-categorized into non-linear and linear SVM. Data can be represented in space as shown in Fig 1.Linear SVM can be geometrically represented by a line which divides the data space into two different regions thus resulting in classifying the said data which can be assigned two class labels corresponding to the two regions[18][19][20].](image-2.png "") 1![Figure 1 : The Hyperplane](image-3.png "Figure 1 :") 22![Figure 2 : Distance of the nearest data vectors from the Hyperplane](image-4.png "Figure 2 : 2 |??|") 3![Figure 3 : (a) Input space (b) Higher dimensional feature space](image-5.png "Figure 3 :") 4![Figure 4 : (a) Birds data belonging to Darjeeling area from input dataset in black color(b) Birds data belonging to parts other tan Darjeeling area from input dataset marked in blue color The region of interest for the raster data set and the classified image is shown in Fig 5.](image-6.png "Figure 4 :") 5![(a) Region of Interest from the input raster data set. (b) Classified image with Andhra Pradesh land represented with green and water represented with light blue color. In this paper the parameters used to evaluate the classification is Accuracy and kappa statistics. The formulae for accuracy, specificity, sensitivity and kappa statistics are provided by equations (](image-7.png "Figure 5 :") ![/data.gbif.org/ and image data accessed through http://maptell.com for providing us with CSV and raster image data sets. We also thank ANU university for providing all the support in the work conducted.](image-8.png "") 1Table1 : Environment SettingItemCapacityCPUIntel CPU G645 @2.9 GHz processorMemory 8GB RAMOSWindows 7 64-bitToolsR, R Studio, Monteverdi toolb) Result Analysis 2Real groupClassification resultNo EventEventNo EventTrue Negative(TN) False Positive(FP)EventFalse Negative(FN) True Positive(TP) 3PredictionReference Other parts DarjeelingOther parts 5711Darjeeling7116 4PredictionReference Land WaterLand780Water056 5datasetsData set typeAccuracyKappa StatisticsCSV data sets98.8595.97Raster TIFF data sets 100100 © 2014 Global Journals Inc. (US) © 2014 Global Journals Inc. (US) Supervised Classification of Remote Sensed data Using Support Vector Machine ## Acknowledgment We direct our frank appreciativeness to CSV data which was accessed via GBIF data portal, * The application of data mining techniques in financial fraud detection: A classification framework and an academic review of literature EW TNgai YongHu YHWong YijunChen XinSun 10.1016/j.dss.2010.08.006 Decision Support Systems 50 3 February 2011,Pages559569,ISSN01679236 * Graphical Representation and Exploratory Visualization for Decision Trees in the KDD Process AWilson ClaudioJ MenesesCastillo Rojas Villegas 10.1016/j.sbspro.2013.02.033 Procedia -Social and Behavioral Sciences 1877- 0428 73 27 February 2013 * Learning Bayesian classifiers from positive and unlabeled examples BorjaCalvo PedroLarrañaga JoséALozano 10.1016/j.patrec.2007.08.003 Pages 23752384,ISSN01678655 1 December 2007 28 * Analysisof Parametric & Non Parametric Classifiers for Classification Technique using WEKA GYugal Kumar Sahoo 10.5815/ijitcs.2012.07.06a I.J. Information Technology and Computer Science 7 2012. July 2012 Published Online * Eyas El-Qawasmeh, Performance of KNN and SVM classifiers on full word Arabic articles IsmailHmeidi BilalHawashin 10.1016/j.aei.2007.12.001 Advanced Engineering Informatics 22 1 January 2008, Pages106111,ISSN14740346 * Fish species classification by color, texture and multi-class support vector machine using computer vision, Computers and Electronics in Agriculture JingHu DaoliangLi YueqiQinglingduan GuifenHan XiuliChen Si ISSN01681699,10.1016/j.compag.2012.07.008 October 2012. Pages133140 88 * The one-against-all partition based binary tree support vector machine algorithms for multi-class classification, Neuro computing XiaoweiYang QiaozhenYu LifangHe TengjiaoGuo 113 3 * Comparison of support vector machine, neural network, and CART algorithms for the land-cover classification using limited training data points YangShao RossSLunetta 10.1016/j.isprsjprs.2012.04.001 ISPRS Journal of Photogrammetry and Remote Sensing 70 June2012,Pages7887,ISSN0924716 * Magnetic resonance brain images classification using linear kernel based Support Vector Machine NRajasekhar SJBabu TVRajinikanth 10.1109/NUICONE.2012.6493213 Nirma University International Conference on 2012. 6-8 Dec 2012 5 Engineering (NUiCONE) * Comparison of support vector machine, neural network, and CART algorithms for the land-cover classification using limited training data points YangShao RossSLunetta 10.1016/j.isprsjprs.2012.04.001 ISPRS Journal of Photogrammetry and Remote Sensing 0924-2716 70 June 2012 * Study on Recognition of Bird Species in Minjiang River Estuary Wetland HongjiLin HanLin WeibinChen 10.1016/j.proenv.2011.09.386 Procedia Environmental Sciences 1878- 0296 10 2011 * Fault diagnosis for a wind turbine transmission system based on manifold learning and Shannon wavelet support vector machine, Renewable Energy BaopingTang TaoSong FengLi LeiDeng 10.1016/j.renene.2013.06.025 February 2014 62 * Class-specific GMM based intermediate matching kernel for classification of varying length patterns of long duration speech using support vector machines ADDileep CChandraSekhar 10.1016/j.specom.2013.09.010 Speech Communication 57 February 2014, Pages126143,ISSN01676393 * A support vector machine approach to CMOS-based radar signal processing for vehicle classification and speed estimation, Mathematical and Computer Modeling Hsun-JungCho Ming-Tetseng 10.1016/j.mcm.2012.11.003 Issues 1-2, July 2013 58 * Oil and gas pipeline failure prediction system using long range ultrasonic transducers and Euclidean-Support Vector Machines classification approach, Expert Systems with Applications RajprasadLam Hong Lee LaiHungRajkumar ChinLo DinoHeng Wan Isa 10.1016/j.eswa.2012.10.006 May 2013. 1925-1934 40 * Forest classification trees and forest support vector machines algorithms: Demonstration using microarray data EliasZintzaras AxelKowald 10.1016/j.compbiomed.2010.03.006 Computers in Biology and Medicine Volume40,Issue5, May2010, Pages519524,ISSN00104825 * Structural twin parametric-margin support vector machine for binary classification, Knowledge-Based Systems XinjunPeng YifeiWang DongXu 10.1016/j.knosys.2013.04.013 September2013,Pages6372,ISSN09507051 49 * Support vector machine for multiclassification of mineral prospectivity areas MaysamAbedi Gholam-Hossain AbbasNorouzi Bahroudi 10.1016/j.cageo.2011.12.014 Computers & Geosciences 46 September 2012,Pages272283,SSN0098004 * Comparison of random forest, support vector machine and back propagation neural network for electronic tongue data classification: Application to the recognition of orange beverage and Chinese vinegar MingjunLiu JunWang DuoWang Li 10.1016/j.snb.2012.11.071 Sensors and Actuators B: Chemical 177 February 2013, Pages970980,ISSN09254005 * Impact of feature selection on the accuracy and spatial uncertainty of per-field crop classification using Support Vector Machines, ISPRS Journal of Photogrammetry and Remote Sensing, Volu-me85 FLöw UMichel SDech CConrad 10.1016/j.isprsjprs.2013.08.007 November2013,Pages102119,ISSN09242716 * Support Vector Machines for Classification and Regression SteveRGunn May1998 Faculty of Engineering, Science and Mathematics School of Electronics and Computer Science, University Of South Hampton Technical Report * Sequential support vector machine classification for small-grain weed species discrimination with special regard to Cirsiumarvense and Galiumaparine, Computers and Electronics in Agriculture TillRumpf ChristophRömer MartinWeis MarkusSökefeld RolandGerhards LutzPlümer 10.1016/j.compag.2011.10.018 January 2012, Pages89-96,ISSN01681699 80 * Multiple support vector machines for land cover change detection: An application for mapping urban extensions HassibaNemmour YoucefChibani 10.1016/j.isprsjprs.2006.09.004 Pages 125-133,ISSN09242716 November 2006 61 * Collaborative work of Environmental Information System (ENVIS) Centre and Important Bird Areas Programmes-Indian Bird Conservation Network (IBA-IBCN) projects of the BNHS MohitSujit Narwade RajkumarKalra DivyaJagdish SagarVarier GautamSatpute STalukdar ; Narwade 2011. 2011 India, Zookeys Literature based species o-ccurrencedataofbirdsofNortheastIndia * A survey of image classification methods and techniques for improving classification performance DLu & QWeng 10.1080/01431160600746456 International Journal of Remote Sensing 28 5 2007 * Efficient Classification Algorithms using SVMs for Large Datasets, A Project Report Submitted in partial fulfillment of the requirements for the Degree of Master of Technology in Computational Science SNJeyanthi June 2007 IISC, BANGALORE, INDIA Supercomputer Education and Research Center * Team}R Core R:A Language and Environment for Statistical Computing, R Foundation for Statistical Computing Vienna,Austria 2013 * Near-miss narratives from the fire service: A Bayesian analysis JenniferATaylor AliciaVLacovara GordonSSmith RaviPandian MarkLehto 10.1016/j.aap.2013.09.012 Accident Analysis & Prevention 62 January 2014,Pages119129,ISSN00014575 * Using global maps to predict the risk of dengue in Europe DavidJRogers JonathanESuk JanCSemenza 10.1016/j.actatropica.2013.08.008 Acta Tropica 0001-706X 129 January 2014 * Predicting the potential habitat of oaks with data mining models and the R system RafaelPino-Mejías MaríaDoloresCubiles-De-La-Vega MaríaAnaya-Romero AntonioPascual-Acosta AntonioJordán-López 10.1016/j.envsoft.2010.01.004 Environmental Modelling & Software July 2010 25 Nicolás Bellinfante-Crocci