Semantic Clustering of Genomic Documents using GO Terms as Feature Set

Authors

  • V.Bhuvaneswari

  • Dr. B.L.Shivakumar

Keywords:

Semantic Clustering, Go Terms, Attributes, Feature Set, Xml

Abstract

The biological databases generate huge volume of genomics and proteomics data. The sequence information is used by researches to find similarity of genes, proteins and to find other related information. The genomic sequence database consists of large number of attributes as annotations, represented for defining the sequences in Xml format. It is necessary to have proper mechanism to group the documents for information retrieval. Data mining techniques like clustering and classification methods can be used to group the documents. The objective of the paper is to analyze the set of keywords which can be represented as features for grouping the documents semantically. This paper focuses on clustering genomic documents based on both structural and content similarity .The structural similarity is found using structural path between the documents. The semantic similarity is found for the structurally similar documents. We have proposed a methodology to cluster the genomic documents using sequence attributes without using the sequence data. The sequence attributes for genomic documents are analyzed using Filter based feature selection methods to find the relevant feature set for grouping the similar documents. Based on the attribute ranking we have clustered the similar documents using All Keyword approach (KBA) and GO Terms based approach (GOTA). The experimental results of the clusters are validated for two approaches by inferring biological meaning using Gene Ontology. From the results it was inferred that all keywords based approach grouped documents based on the semantic meaning of Gene Ontology terms. The GO terms based approach grouped larger number of documents without considering any other keywords, which is semantically relevant which results in reducing the complexity of the attributes considered. We claim that using GO terms can alone be used as features set to group genomic documents with high similarity.

How to Cite

V.Bhuvaneswari, & Dr. B.L.Shivakumar. (2012). Semantic Clustering of Genomic Documents using GO Terms as Feature Set. Global Journal of Computer Science and Technology, 12(C10), 13–19. Retrieved from https://computerresearch.org/index.php/computer/article/view/512

Semantic Clustering of Genomic Documents using GO Terms as Feature Set

Published

2012-01-15