# Introduction ocument model in the information retrieval has three main components, namely Text Preprocessor, Topic Extractor and Corpus categoryzation. These components are integrated to deploy knowledge extraction in information system. In spite of this, the growing data and its knowledge recognition complications have considerably encouraging the extensions of machine learning algorithms. # a) Document Model The text document Modeling is observed as latent topics model. Various prominent approaches in machine learning are used to study the model. Document model is a mixture of topics [4]. Topics are inferred by the collection of correlated words. But unsupervised learning perspective is the pulse of bubbling out the topics. By modeling, varieties of mining range can be established with various subjects. The models try to observe the likely documents and tend to focus on topics. But document models are discriminant because of random words due to linguistic factors such as synonym, hyponym, Polysemy, etc. # b) Text Pre-processor The functionalities essential for machine learning of document are document pre-processing and corpus representation. Stop words removal, word stemming, filtering to exclude certain words, are done within each document. This process is called preprocessing of documents. Obtained vocabulary is put up in the word-document matrix which is generally called as bag-of-words model. The document representations may be in binary (0, for nonoccurrence and 1 for occurrence of each term in a document), term frequency (tij -number of occurrence of ith word in jth document) and term frequency inverse document frequency (probable occurrence of tij' -distribution of ith word in jth document). Obtained data in this stage is huge in dimension, and lot of techniques [15] have been proposed for dimension reduction. # c) Topic Extractor A topic model is a probabilistic model that can be considered as a mixture of topics, represented by probability distributions of words in a document. The latent variables or topics are the inferring components of this model. The main objective is to learn from documents the distribution of the underlying topics in a given corpus. Topic model is Text corpora representation by a co-occurrence matrix of words and documents. The probabilistic latent semantic analysis (PLSA) model [10] uses probability of words with given topics and probability of topics in a document, to build a topic model. The Latent Dirichlet Allocation (LDA) model [1], is another probabilistic approach which ties the parameters of all documents through hierarchical generative model. # d) Corpus Categorization Text Categorization is a classical application of Text Mining [19], and is used in email filters, social tagging and automatic labeling of documents in business libraries. Text mining applications in research and business intelligence include, latent semantic analysis techniques in bioinformatics automatic investigation of jurisdictions plagiarism detection in universities and publishing houses, cross-language information retrieval, spam filters learning, help desk inquiries, measuring customer preferences by analyzing qualitative interviews, automatic grading, fraud detection or parsing social network for ideas of new products [9]. # II. # Literature Support The theory of fuzzy set is Consider as a degree of membership assigned to each element, where the degree of non-membership is just automatically equal to D its complement. However, human interpretation often does not express the corresponding degree of nonmembership as the complement to 1. So, Atanassov [1][2] [3] introduced the concept of intuitionistic fuzzy set that is meant to reflect the fact that the degree of nonmembership is not always equal to 1 minus degree of membership, but there may be some hesitation degree. Intuitionistic fuzzy set is a generalized constructive logic applied in fuzzy set. It is defined on a X of objects, with each object x is described by the degrees of membership and non-membership to a certain property, ( ) ( ) ( ) { } X x x x x A A ? , , , ? µ (1) ( )( ) ( ) 1 0 ? + ? x x A A ? µ X x ? ? (2) Therefore the degree of non determinacy of the object x with respect to the intuitionistic fuzzy set A is imposed as, ( ) ( ) ( ) x x x A A A ? µ ? + = X x ? ? (3) The model is well suited to represent a classification problem with high dimension. The confusion matrix of high dimension can be probably reduced to concept matrix of low dimension. The similarity measures [14] and distance measures [21] [20] between two intuitionistic fuzzy sets can be applied in pattern recognition. In this paper, a Partition based approach [16] inspired by Hierarchical segmentation [8] and topic based segmentation [6] are extended using Intuitionistic fuzzy set approach [23] for local centralization of conceptual words. The intuitionistic fuzzy set theory is applied in conceptual term/topic detection. A cosine similarity and correlation are taken into for defining membership degree and the non-membership degree respectively. The results using this measure found better with respect to the dataset chosen. In literature a intuitionistic fuzzy representation of images for clustering [18] [12] by utilizing a novel similarity metric are defined. But a minimal support is extended for text classification. So, a local centralization of conceptual terms using Intuitionistic logical clustering has been applied in the work. # III. # Proposed Model -Intuitionistic Partition based Concept Granulation (IPCG) Intuitionistic logic is a natural deduction system [13],that have introduction rules µ and elimination rules ? for the logical connectives and quantifiers. The { } ) ( ), ( , ij i ij i ij w w w A ? µ = where 1 0 < < ij w (4) The similarity between words and on a topic is calculated by the cosine measure. Each document vector is normalized with the weight and length of terms in k partitions. Then the optimal term ij w [16] should The intuitionistic angular or cosine similarity [22] measure between the m terms in a partitioned set is given as follows: ( ) ? ? ? = = = = m i i B m i i A m i i B i A x x x x B A C 1 2 1 2 1 ) ( ) ( ) ( ) ( , µ µ µ µ (6) The intuitionistic correlation [7] of rows all fuzzy numbers are included from the samples of tf-idf (Partition Model). The crisp set is modified intuitionistically with the sample mean and variance of membership function as: ( ) ( )( ) ( ) ( ) ? ? ? = = = ? ? ? ? ? ? ? ? ? ? = n i B i B n i A i A n i B i B A i A I x x x x B A CR 1 1 1 ) ( ) ( ) ( ) ( , µ µ µ µ µ µ µ µ (7) The effectiveness of the intuitionistic classification of corpus is approximately studied and analyzed using the following entropy [22] specifically used for Intuitionist Fuzzy Set 'A'. ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ? = + + = n i i A i A i A i A i A i A x x v x x x v x n E 1 , max , min 1 ? µ ? µ (8) IV. # Datasets a) Newspaper Article collection The newspaper articles under different topics are collected. The categories are marked. The training and testing documents are randomly chosen. The growing social media made essential to include newspaper article collection to include in this work. News are generally categorized by topic area ("politics," "business," etc.) written in clear, correct, "objective," and somewhat schematized language [5]. This would pave way to extend the research towards social networking and marketing. The collection includes about 780 documents with 25 categories. All new social relevant topics ("mobile","opinion", etc.) are included for categorizing. # b) Reuters-21578 Data Set The Reuters-21578 Data Set collection provides a classification task with challenging properties. There are multiple categories, the categories are overlapping and non exhaustive, and there are relationships among the categories. There are interesting possibilities for the use of domain knowledge. There are many possible feature sets that can be extracted from the text, and most plausible feature/example matrices are large and sparse [11]. # c) Movie Review Dataset The Movie Review Dataset, Polarity dataset v0.9 with 900 positive and 900 negative reviews is used. Using movie reviews as data, the problem of classifying documents using standard machine learning techniques definitively outperform human-produced baselines processed reviews [17]. The training cases are chosen randomly from each class about 100 documents. Which means about 500 cases are considered for training. V. # Results and Analysis The machine learning classification methods, such as Bayesian, Naïve Bayes, J48, Support Vector Machines, LMT are strong enough to support classifications. In the case of concept granulation in document classification, the feature selection is fine tuned to achieve categories strictly connected to the human perception. Before imposing the features into the classifier, some form of selection must be chosen. The proposed method, selects the features according to the intuitionist logic. The features tf-idf matrix has been The proposed Concept Granulation Using Intuitionistic Partition Based Classification Model is implemented administered in the Java based system and analyzed for its significance. The intuitionistic correlation is applied to the specified datasets. In which the chosen dataset and the partitions play the very important role in finding the result of the model. The tfidf-IP is favorable for Reuter dataset than for Newspaper and Movies. This is represented in the Figures 2(a The perplexity is depicted in Figure 3 and Table1. So the analysis can be interpreted or inferred in the following ways: Intuitionistic approach is in favor of the classified documents or corpus chosen Partition plays the important role in the proposed model. Out of four types of partition, k=8 plays a smoothened strong support for the proposed model k=16, the highest partition yield only a very moderate result and more confusions. k=4, the least partition model yield the smooth but less significant support for all the datasets. k=8, yield the partially smooth but supportive significant for the movie dataset. (Than other partitions) The results are focused to average training datasets and micro f-measure (Table 2) to show up the IPCG performs better with dimension reduction for categorization of corpus. Every datasets chosen for analysis behaves to the pull and push of various stages of the proposed model. # Conclusions In this paper, we have proposed a intuitionistic partition based concept granulation topic-term model for a nominal tf-idf vector space model which is often used in information retrieval, topic analysis, and automatic classification. The cosine distance and correlation treatment to the tf-idf reduces the dimension and improves the efficiency of bag of words/terms in topics. However, it is priory treated using the intuitionistic partition for fitting the model into decision-making problems. To account this, Intuitionistic partition based cosine similarity measure between topic/terms and correlation between document/topic are included. The proposed fuzzy model is tailored with normal combinational approach to fetch intuitionistic fuzzy crisp set. Yet, it is observed the model is well behaving and promising for the categorized documents and not so bad support for the low inference corpus collections like movie review. So, this make us clear that the social media documents should be specially treated before introducing this model. It is felt that aggregation of social media topic-terms is needed. This is taken for future work or extension of the proposed work. ![x does not belongs to the set A . The model is defined by the restriction](image-2.png "") ![document classification system needs conceptual terms ) (µ , non deterministic terms or noises ( ) ? with logics and reasons to quantify concept granules. Let A be a tf-idf matrix of m n? represents corpus. Each value is associated to ? Set of terms representing the membership of domain ) (x A µ Term representing the non membership of domain ) (x A ? Algorithm: IPCG For each document { Lowercase, numbers, special characters from document Remove stop list words from document Split document into k partitions For each segment { Find frequency of words Prepare matrix with each segment as row and words as columns Include non zero frequency as member Cosine similarity distance between each segments calculated Discard the segment with least distance } Single row or vector of a document has been found Intuitionistic Correlation to include conceptual terms in topic Classify the document and find entropy } The intuitionistic fuzzy set A is generated by](image-3.png "?") ![Journals Inc. (US) Intuitionistic Partition based Conceptual Granulation Topic-Term Modeling be picked from the non sparse term of k partitions. i.e.](image-4.png "") 1![Figure 1 : Partition Model {ri<=n (i.e. r is random or varies from document to document)(where i=1,2,?m), k = no. of partitions or segments}](image-5.png "Figure 1 :") ![based feature model. The proposed approach is modeled as a probability distribution over the set of Topic/Words represented by the vocabulary. These distributions are sampled from multi-nominal distributions.](image-6.png "") 2![Figure 2 : Intuitionistic correlation Vs The number of training documents](image-7.png "Figure 2 :") 1Training with 300 DocDimension ReductionPerplexity CorrelationNewspaper26%0.2310.582Reuters22%0.3110.520Movie16%0.4830.480Datasettf-idfIPCGClassifiersReuters News Paper MovieReutersNews PaperMovieSVM0.4820.4220.3210.8440.8410.799NB0.4010.3690.2970.8720.8340.810J480.4000.3990.3810.7980.7970.784Bayes'0.5410.4110.3990.8310.8540.829LMT0.4420.5410.5870.8780.7980.722 2 © 2014 Global Journals Inc. (US) Intuitionistic Partition based Conceptual Granulation Topic-Term Modeling * Latent Dirichlet Allocation DMBlei AYNg MIJordan Journal of Machine Learning Research 3 2003 * Probabilistic Latent Semantic Analysis THofmann Proceedings of the 15th Annual Conference on Uncertainty in Artificial Intelligence (UAI-99) the 15th Annual Conference on Uncertainty in Artificial Intelligence (UAI-99)San Francisco, CA Morgan Kaufmann 1999 * FSebastiani 10.1145/505282.505283 Machine Learning in Automated Text Categorization 2002 34 * IFeinerer KHornik DMeyer Journal of Statistical Software 25 2008 * KTAtanassov Intuitionistic Fuzzy Sets, Theory, and Applications, Series in Fuzziness and Soft Computing Phisica-Verlag 1999 * Intuitionistic Fuzzy Set, Fuzzy Sets System KTAtanassov 1986 * Intuitionistic Fuzzy Set KTAtanassov SStoeva Polish Symposium on Interval and Fuzzy Mathematics 1993 * New Similarity Measures Of Intuitionistic Fuzzy Sets And Application To Pattern Recognition DLi CCheng Pattern Recognition Letter 23 2002 * Entropy for Intuitionistic Fuzzy Set, Fuzzy Sets System ESzmidt JKacprzyk 2001 118 * Distance Between Intuitionistic Fuzzy Set, Fuzzy Sets System ESzmidt JKacprzy 2000 114 * Domain Classifier using Conceptual Granulation and Equal Partition Approach DMalathi SValarmathy Indian Journal of Engineering 7 2013 Science and Technology * Topic-Based Hierarchical Segmentation JTChien CHChueh IEEE Transactions on Audio, Speech and Language Processing 20 2012 * Topicbased document segmentation with Probabilistic Latent Semantic Analysis TBrants FChen ITsochantaridis the proceeding of International Conference on Information and Knowledge Management 2002 * Clustering Algorithm for Intuitionistic Fuzzy Sets ZXu JChen JWu Information Sciences 178 2008 * Fuzzy Clustering of Intuitionistic Fuzzy Data NPelekis DKIakovidis EKEvangelos IKopanakis International Journal of Business Intelligence and Data Mining 3 1 * Intuitionistic Fuzzy Clustering with Applications in Computer Vision. Advanced Concepts for Intelligent Vision Systems DKIakovidis NPelekis EKEvangelos IKopanakis Lecture Notes in Computer Science 5259 2008 * Semantics and Aggregation of Linguistic Information, Based on Hedge Algebras VHLe CHNguyen FLiu The 3rd International Conference on Knowledge, Information, and Creativity Support Systems 2013 * Multicriteria Decision-making Method Based on a Cosine Similarity Measure between Trapezoidal Fuzzy Numbers JYe International Journal of Engineering, Science and Technology 3 2011 * Correlation of fuzzy sets, Fuzzy Sets and Systems DAChiang NPLin 1999 102 * Text Mining for News and Blogs Analysis BBerendt C. Sammut, & G. I. Webb 2010 Springer London Encyclopedia of Machine learning * BPang LLee SVaithyanathan Thumbs up? Sentiment Classification using Machine Learning Techniques, Proceedings of EMNLP 2002 * A Comprehensive Survey on Dimension Reduction Techniques for Concept Extraction from a Large Corpus DMalathi SValarmathy International Journal of Computing Information Systems 3 2011