# Introduction ocument clustering [1], [2], [3], [4] techniques find relevance in a wide range of tasks from a simple search with a few terms to vast information retrieval processes. The early document clustering techniques used were developed for typically enhancing information retrieval systems [5], were designed to find documents according to the query type, however could not perform the task of creating a query, generate a synopsis of the documents, or provide an interface to the search results. The progress of internet, digital libraries, news sources and companywide intranets has made available huge volumes of text documents. The tremendous increase in the already quantum size of web data and the classification of the web documents into relevant and moderate number of clusters has led to the development of large number of web clustering engines and high performing clustering algorithms. The process of document clustering involves four stages which are, i) Data collection, crawling to accumulate the documents, indexing the set of documents in a structured fashion, filtering of data with techniques of tokenization, stop words removal and stemming, lemming etc. ii) preprocessing where the data is represented in suitable form, vector etc. and measurable factors applied to determine the similarity, iii) Document clustering where a clustering technique and an efficient clustering algorithm are identified for clustering based on preset criteria and iv) Post processing involving applications of business and scientific requirements adaptation of the document clustering technique. The applications of document clustering are of diverse nature such as, i) Creation of document taxonomies ii) IR process of search, accessing and collection [6], Similar documents identification, review and classification of results [7], automatic topic extraction [8], content summarization iii) Recommendation System, iv) Search Optimization, etc. For instance the processes are used enormously in the data classification process such as Google Web Directory, Social media data classification etc. The clustering techniques though being studied since several years, still face many of the same challenges. These challenges [9,10] of document clustering are mostly of, i) Huge volume of data, ii) The high dimensionality of the feature space, iii) A feasible clustering method in terms of constraints such as cluster quality and performance and iv) Representing the results in an effective browsing interface. The current challenges associated with text clustering are the requirement of dynamic clustering techniques to incrementally update clusters as new data is added [11,12]. For instance the social media has to generate user specific content [13] instantly and this requires real time data clustering methodologies. The remainder of this paper is organized as follows. In Section 2 we discuss the "Taxonomy" of document clustering, in Section 3 the "Contemporary literature work of clustering techniques" are evaluated and Section 4 gives the "Conclusion" of the paper. # II. Taxonomy The clustering functionality can be expressed as a function comprising of a document set mapped to a D set of clusters. Based on specified constraints the minimum and maximum of the function defines the clustering difficulty and algorithms applied over the similarity criteria determine the clustering quality. The preprocessing step of clustering for finding the document similarity is determined with methods based on the following strategies, (i) phrase or pair-wise methodology, (ii) tree form data depiction, (iii) component dependent data depiction, (iv) semantic relation dependent documents depiction, (v) concept and feature vector dependent depiction. The clustering methods of are generally of two types, 1) Word patterns and phrases based 2) Feature based. The clustering methods algorithms are mostly of two types 1) hierarchical methods and 2) partitioning methods (non hierarchical) [14,15,16]. The hierarchical algorithms for clustering represent data sets as a cluster tree and are of two types 1-1) agglomerative [17] 1 -2) divisive hierarchical clustering methods. Partitional clustering algorithms [17] are of two types, 2-1) iterative 2 -2) single pass methods. K means and its variants etc. are the popular partitioning methods. The hierarchical clustering algorithms are considered efficient than the remaining algorithms [18] however due to their inherent complexness they are not applicable to huge document sets. The techniques for determining inter-cluster similarity in classification [19 20] ex. single link and for enhancing the value of the clusters where the cluster size differs or fluctuates by a huge factor [17], especially in case of high performing clustering algorithms have been studied widely in recent years. The widely used document clustering methods are Spectral Clustering, LSI dependent cluster development and NMF technique based clustering. The Spectral clustering methods [21] are LPI, LSI etc. Latent semantic indexing (LSI) [22] a feature extraction approach [23] tries to optimize the documents space compared to the given document and is a widely used linear document indexing method [24]. LSI is inapplicable for processes with a high range of documents [24] and similarly spectral clustering when used in a large dimensional space the dimensionality reduction is very costly which limits its usability. The word patterns and phrases based approaches are the traditional strategies where the clustering is dependent on the documents features such as words, phrases and sequences [25,26]. These methods are of four types, 1-1) Clustering with Frequent Word Patterns 1-2) Application of Word Clusters in Document Clusters 1-3) Co-clustering Words and Documents, Co-clustering with graph partitioning and Information-Theoretic Co -clustering 1-4) Clustering based on Frequent Phrases. The technique VSM is used in almost all the document clustering methods used nowadays [27]. The vector space model is a data model for representing the terms related to the words in a document as a feature vector. The features based clustering approaches are of two types 2-1) Feature Extraction 2-2) Feature Selection. The Feature Extraction approaches are based on the algorithm of two types i) linear and ii) nonlinear techniques. The models of linear type algorithms are unsupervised PCA, OCA, MMC etc. The examples of non linear algorithms are LLE, Laplacian Eigenmaps, and ISOMAP etc. The linear methods show better operational performance in contrast to nonlinear approaches, however underperform in the clustering of huge and complicated data of the internet. The feature extraction technique finds applications in the fields of IR based on human language learning ability, comparing reviewed and submitted papers, of various languages or networks and filter of data. Feature selection algorithms are of two types, 2-2-1) Feature Ranking that is metric based and 2-2-2) Subset Selection from the possible features. The feature selection algorithms are of two categories, i) supervised and ii) unsupervised. The supervised feature selection algorithms are the most researched as well as used and they are IG, CHI, and MI. The unsupervised methods that are most popular are, i) DF-based selection dependent on term strength and ranking dependent on entropy or term contribution, ii) LSI-based method and iii) NMF based method. These techniques of unsupervised approach such as, decision trees, statistics, NLP and ML are being used in BI or analytics, in neural networks for developing AI or bio neural networks, for developing systems of AI that are rule based for intelligent content development, database development, information retrieval and automatic grouping of web documents with Enterprise Search engines or open source software's in web mining or text mining. The strategies of feature selection used mostly are i) wrapper, ii) filter and iii) embedded methods [28] however a study [29] has shown, the methods of supervised feature selection dependent on algorithms using the filter metric IG, are most efficient over others techniques. # III. # Contemporary Affirmation of the Recent Literature An approach of bisecting k-means algorithm proposed by Steinbach, M, Karypis, G, & Kumar, V [14] breaks up a large cluster into small clusters repetitively to generate k numbers of clusters of huge similarity for filtering the clusters and collecting similar texts based on the method. A technique called CCA [30] widely used in the emerging technologies of ML etc applies correlation for measuring the similar features in a document. However, CCA has its own limitations in clustering. # C An approach of spectral clustering based on graph partitioning strategy called LPI [31] proposed however fails in feature selection and comprises of the existing problems of distance based clustering documents. An approach for document clustering called Frequent Term based Clustering or HFTC [32] is a topic of extensive research. However it is not scalable for huge data or of documents. A technique known as Hierarchical Document Clustering using Frequent itemsets (FIHC) approach proposed by Fung, B., Wang, K., Ester, M, is discussed in [33]. The strategy of FIHC though performs better than HFTC underperforms in clustering efficiency when compared to existing approaches such as UPGMA and Bisecting K-means. The TDC algorithm technique based on closed frequent itemsets for clustering is proposed by Yu, H., Searsmith, D., Li, X., Han, J [34]. The algorithm performs better compared to HFTC and FIHC however the use of closed itemsets makes it avoidable. A strategy of Hierarchical Clustering using Closed Interesting Itemsets, referred to as HCCI proposed by Malik, H.H., Kender, J.R [35], is the best clustering method available. However the technique may cause information loss. An approach based on PSSM histogram by Gad and Kamel [36] combines the text semantic with the process of incremental clustering and measures the similarity of the documents for adjusting the insertion order of the documents in the cluster for quality. An improved incremental clustering technique for an efficient clustering algorithm proposed by Gavin and Yue [37] improves categorization of web data incrementally. The method based on cluster specific multiple information anew document is assigned to a cluster. An approach for improving text clustering mining by Shehata, S, Fakhri, K, & Mohamed S, S. [38] outperforms the existing techniques such as HAC, k-NN etc. A progressive clustering algorithm by Liu, Y, Ouyang, Y, Sheng, H, & Xiong, Z. ( 2008) [39] based on Cluster Average Similarity Area determines the cluster coherence and progressively assigns the new data items to the clusters. A technique for enhancing the clustering functionality based on the partial disambiguation of words by means of their PoS [40] is recommended by the developers as the approach finds the inefficiency of considering synonyms and hypermy my for selecting the right sense of the word disambiguated solely by PoS tags. The CFWS technique proposed by Y. LI, and S.M. Chung, enhances the capability to process the document, considering the word sequences apart from the words [41]. The technique of non linear representation of the data by J.B. Tenenbaum, V. de Silva, and J.C. Langford [42] keeps specific local data simultaneously based on the optimization factors however is associated with high complexity. A study of the approaches for reducing the complexity of feature extraction based on a new technique called approximation algorithm [43], [44], [45] is found to be good. A software for automatically retrieving information from websites by Zamir O Etzioni [46] is designed for websites comprising of vast amount of data The approach of integrating clustering and feature selection for text clustering based on the semantic relation of the text documents with ontology was proposed by Thangamani.M and P.Thangaraj in [47]. The approach minimizes dimensionality and improves feature selection. The clustering technique, for finding the clustering quality based on WordNet [48] phrasal noun and semantic relationships [49] shows better performance with hyperny my based strategy compared to other noun phrases. A system for determining the ontology related semantic relations of the term or word and associated weight measure is given by Prof. K. Raja, C. Prakash Narayanan [9]. However the technique has dimensionality and other problems. A description of the task of Ontology based automatic categorizing of web documents [50] and the scope of Ontology in improving the current machine learning and IR approaches is given by Andreas Hotho. The integration of ontology's for combining various information types of multiple resources by Young-Woo et al. in the paper [51]. The process of using domain specific ontology's for enhancing performance of text classification where text learning and IR are used to generate ontology's with minimum user interaction is given in [52,53]. The methods utilizing Wikipedia ontology for improving primarily the document depiction and cluster quality by Gabrilovich and Markovitch [54] and a further extension provided a structure based on the Wikipedia guidelines and groups [55,56]. The Wikipedia ontology is most relevant as it is applicable to a large cross section of domains and also restructured on a regular basis. A technique for feature selection in text clustering based on supervised feature selection on the intermediary clustering outcomes by Xu, J. Xu, B [57] generates a efficient subset for classification. The suggested techniques performance is efficient compared to manual process. A technique of feature selection dependent on the ACO algorithm by M. Janaki Meena,K.R. # Year 2015 Global Journal of C omp uter S cience and T echnology Volume XV Issue II Version I ( ) C Chandran,J. Mary Brinda," [58] is a unique method. Comparative tests of the approach with existing chisquare and CHIR techniques shows the proposed approach achieves better performance in FS. An entropy based FS approach i.e. a filter solution [59] tested with various data types that reduces dimensionality and is efficient in finding the subset of major features. A feature co-selection method called MFCC (multi type feature co-selection), proposed by Shen huang, Zheng Chen, Yong Yu, and Wei-Ying main [60] shows enhanced clusters performance of web documents based on the outcomes of intermediate clustering. A method to remodel the matrix of data similarity as a bi-stochastic matrix prior to executing algorithms by F. Wang, P. Li, and A. C. K Aonig showed better clustering performance [61]. The techniques of document clustering that are term based for clustering in dynamic environments, is given in [11] by Wang, X, Tang, J, & Liu, H, synonyms and hypermy m\ y by Bharathi and Vengatesan [62], Synonyms and Hyponyms, Nadig, R, Ramanand, J, & Bhattacharyya, P in [12]. These approaches are however not applicable to technically similar documents. A document clustering approach [63] dependent on phrases and the STC technique by O. Zamir, O. Etzioni, O. Madanim, and R.M. Karp builds the clusters on the common documents suffixes. The method though efficient in cluster quality however is associated with high amount of term redundancy. A study of the TF-IDF method of clustering [64], term frequency dependent algorithms [65] and a review of clustering algorithms [66] showed that majority of clustering approaches are TF-IDF based, however associated with several problems. The NMF (Nonnegative Matrix Factorization) technique in text classification [67], improved clustering performance compared to the existing approaches [68] , relationship study of NMF techniques with earlier clustering techniques [69], [70] [71]. A review of established techniques of NMF such as multiplicative updates [72], projected gradients [73] though efficient however are associated with the problems of memory for huge datasets streamed and not disk based [74]. To overcome these problems, approaches such as random projections [61,75] and sketch/sampling algorithms [76] have been proposed. An NMF based technique by Li and Zhu in 2011 [77] for research specific documents minimizes high dimensionality, finds relevant topics for clustering and shows performance efficiency in classification comparatively. A study of the online algorithm based on Nonnegative Matrix Factorization [78], a NMF based method that uses features based on weights and similar cluster property by Sun Park, Dong Un An, Choi Im Cheon [79] performs comparatively more efficiently than the remaining NMF based strategies. IV. # Conclusion In this paper we analyzed several techniques developed for clustering documents with their applications and relevance in terms of today's requirements. The task of developing perfect strategies for classification of varied forms and types of documents for a near optimal solution or finding accurate ways of assessing the quality of the performed clustering though is impossible and is increasing in its complex nature, the field today deals with extraordinary tasks like granular taxonomies generation, sentiment analysis and document summarization for generating reliable and relevant insights applicable to several fields. In conclusion we can say document clustering is going to be widely studied and will find relevance in a number of newer areas. 2015![Journal of C omp uter S cience and T echnologyVolume XV Issue II Version I ( )](image-2.png "Year 2015 Global") © 2015 Global Journals Inc. (US) © 2015 Global Journals Inc. (US) 1 * Recent Advances in Clustering: A Brief Survey SKotsiantis PPintelas WSEAS Trans. Information Science and Applications 1 1 2004 * Document Clustering by Concept Factorization WXu YGong Proc. Int'l Conf. Research and Development in Information Retrieval Int'l Conf. Research and Development in Information Retrieval July 2004 * Restrictive Clustering and Metaclustering for Self-Organizing Document Collections SSiersdorfer SSizov Proc. Int'l Conf. Research and Development in Information Retrieval Int'l Conf. Research and Development in Information Retrieval July 2004 * BrianSEveritt SabineLandau MorvenLeese Cluster Analysis Oxford University Press 2001 fourth edition * Van Rijsbergen London: Buttersworth 1989 Secondth ed. * A survey of web clustering engines CCarpineto SOsi´nski GRomano DWeiss ACM Comput. Surv 41 3 2009 * Cluster-based retrieval using language models XLiu WBCroft Proceedings of the 27th annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR) the 27th annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR) 2004 * Document clustering and cluster topic extraction in multilingual corpora JSilva JMexia ACoelho GLopes Proceedings of the 1st IEEE International Conference on Data Mining (ICDM) the 1st IEEE International Conference on Data Mining (ICDM) 2001 * Clustering Technique with Feature Selection for Text Documents .KProf CPrakashRaja Narayanan Proceedings of the Int.Conf. on the Int.Conf. on * Information Science and Applications ICISA 2010 6 February 2010 * Machine Learning in Automated Text Categorization FabrizioSebastiani ACM Computing Surveys 34 1 March 2002 * H Document clustering via matrix representation XWang JTang Liu 11th IEEE International Conference on DataMiningICDM2011 2011 * Automatic evaluation of Word Net synonyms and hypermy my India RNadig JRamanand PBhattacharyya Proceedings of ICON-2008, 6th International Conference on Natural Language Processing ICON-2008, 6th International Conference on Natural Language Processing 2008 * Google news personalization: Scalable online collaborative filtering ADas MDatar AGarg SRajaram Proceedings of the 16th International Conference on World Wide Web (WWW) the 16th International Conference on World Wide Web (WWW) 2007 * MSteinbach GKarypis VKumar A comparison of document clustering techniques. KDD Workshop on Text Mining 2000 * Survey of clustering data mining techniques PBerkhin 2004 * Survey of Clustering Algorithms Xu Rui IEEE Transactions on Neural Networks 16 3 2005 * Hierarchical Document Clustering Using Frequent Itemsets BC MFung KWan MEster 2003 3 * Concept decompositions for large sparse text data using clustering ISDhillon DSModha Machine Learning 2001 42 * Data Stream Clustering: Challenges and Issues MKhalilian & NMustapha Proceedings of the International Multiconference of Engineers and Computer Scientists IMECS 2010 the International Multiconference of Engineers and Computer Scientists IMECS 2010Hong Kong 2010 * Providing QoS with the Deficit Table Scheduler RMartinez-Morais FJAlfaro-Cortes &J LSanchez IEEE Transactions on Parallel and Distributed Systems 21 3 2010 * On Spectral Clustering: Analysis and an Algorithm AYNg MJordan YWeiss Advances in Neural Information Processing Systems 14 2001 MIT Press * Latent Semantic Indexing (LSI) and TREC-2 STDumais Proc.Second Text Retrieval Conf. (TREC) .Second Text Retrieval Conf. (TREC) 1993 * Lsa @ Cu Boulder 2010 * Indexing by Latent Semantic Analysis SCDeerwester STDumais TKLandauer GWFurnas RAHarshman J. Am.Soc. Information Science 41 6 1990 * Efficient Phrase-Based Document Similarity for Clustering CHung DXiaotie IEEE Transaction on Knowledge and Data Engineering 20 September. 2008 * Text document clustering based on frequent word meaning sequences MCSoon DHJohn Yanjun L Data & Knowledge Engineering 64 2008 * Text Categorisation: A Survey KAas LEikvil 941 1999 Norwegian Computing Center Oslo Norway Technical Report iteseer.ist.psu.edu/ aas99text.html * BarakChizi Tel-Aviv University, Israel * LiorRokach Ben-Gurion University, Israel * a survey of feature selection techniques 10.4018/978-1-60566-010-3.ch289 pp. John Wang 2009 13 9781605660103 Oded Maimon (Tel-Aviv University, Israel ; Montclair State University, USA * A Comparative Study on Feature Selection in Text Categorization YYang JOPedersen Proc. 14th Int'l Conf. Machine Learning 14th Int'l Conf. Machine Learning 1997 * Canonical Correlation Analysis: An Overview with Application to Learning Methods DRHardoon SRSzedmak JRShawetaylor J. Neural Computation 16 12 2004 * Locality Preserving Indexing DCai XHe JHan Document Clustering Using Knowledge and Data Eng 17 12 Dec. 2005 IEEE Trans * Frequent Term-based Text Clustering FBeil MEster XXu Proc. of Intl. Conf. on Knowledge Discovery and Data Mining of Intl. Conf. on Knowledge Discovery and Data Mining 2002 * Hierarchical document clustering using frequent Itemsets BC MFung KWang MEster Proceedings of SIAM International Conference on Data Mining SIAM International Conference on Data Mining 2003 * Scalable Construction of Topic Directory with Nonparametric Closed Termset Mining HYu DSearsmith XLi JHan Proc. of Fourth IEEE Intl. Conf.on Data Mining of Fourth IEEE Intl. Conf.on Data Mining 2004 * High Quality, Efficient Hierarchical Document Clustering Using Closed Interesting Itemsets HHMalik JRKender Proc. of IEEE Intl. Conf. on Data Mining of IEEE Intl. Conf. on Data Mining 2006 * Incremental clustering algorithm based on phrase-semantic similarity histogram WKGad MSKamel Proceedings of the Ninth International Conference on Machine Learning and Cybernetics the Ninth International Conference on Machine Learning and Cybernetics 2010 11 * Enhancing an incremental clustering algorithm for Web page collections SGavin XYue ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technologies 2009 * An efficient concept-based mining model for enhancing text clustering SShehata KFakhri SMohamed S IEEE Transactions On Knowledge And Data Engineering 22 10 2010 * An Incremental Algorithm for Clustering Search Results YIu YOuyang HSheng ZXiong IEEE International Conference on Signal Image Technology and Internet Based Systems 2008 * Wordnetbased text document clustering JSedding DKazakov 2004 3rd Workshop on Robust Methods in Analysis of Natural Language Data * Text Document Clustering Based on Frequent Word Sequences YLi SMChung Proceedings of the. CIKM the. CIKMBremen, Germany 2005. 2005. October 31-November 5 * A Global Geometric Framework for Nonlinear Dimensionality Reduction JBTenenbaum VSilva JCLangford Science 290 2009 * Candid Covariance-Free Incremental Principal Component Analysis JWeng YZhang W.-SHwang IEEE Trans. Pattern Analysis and Machine Intelligence 25 8 Aug.2003 * On Successive Learning Type Algorithm for Linear Discriminant Analysis KHiraoka MHamahira IEIC Technical Report 99 1999 in Japanese * IMMC: Incremental Maximum, Marginal Criterion JYan BSZhang ZYan WChen QFan WYYang QMa Cheng Proc. 10th ACM SIGKDD 10th ACM SIGKDD 2004 * Web Document Clustering, A Feasibility Demonstration OZamir KDevelopment Mugunthadevi Proceedings of the 21st International ACM SIGIR Conference on Research the 21st International ACM SIGIR Conference on Research IJCSE * integrated clustering and feature selection scheme fo textdocuments PMThangamani Thangaraj 10.3844/jcssp.2010.536.54 DOL:10.3 844/jcssp.2010.536.541 J.Comput.Sci 6 536 * Wordnet: A lexical database for English GMiller CACM 38 11 1995 * Exploiting noun phrases and semantic relationships for text document clustering KangZheng Kim Information Science 179 2009 * Using Ontologies to Improve the Text Custering and Classification Task AndreasHotho January 14, 2005 Knowledge and Data Engineering Group, University of Kassel * Feature Selections for Extracting Semantically Rich Word for Ontology Learning Young-WooSeo AnupriyaAnkolekar KatiaSycara CMU-RI-TR-04-18 March 2004 * Towards semantic web mining BBerendt AHotho GStumme Proceedings of International Semantic Web Conference (ISWC) International Semantic Web Conference (ISWC) 2002 * Ontologybased text clustering AHotho SStaab AMaedche Proceedings of the IJCAI-2001 Workshop Text Learning: Beyond Supervision the IJCAI-2001 Workshop Text Learning: Beyond SupervisionSeattle,USA August 2001 * Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis EGabrilovich SMarkovitch Proc. of The 20th Intl. Joint Conf.on Artificial Intelligence of The 20th Intl. Joint Conf.on Artificial Intelligence 2007 * Exploiting Wikipedia as External Knowledge for Document Clustering XHu XZhang CLu Proc. of Knowledge Discovery and Data Mining of Knowledge Discovery and Data Mining 2009 * Enhancing Text Clustering by Leveraging Wikipedia Semantics JHu LFang YCao Proc. of 31st Annual Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval of 31st Annual Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval 2008 * A new feature selection method for text clustering JXu BXu WZhang ZCui WZhang wuhan university journal of natural sciences 12 2007 * integrating swarm intelligence and statistical data forfeature selection in text categorization MMeena KRChandran JMaryBrinda ©2010 International Journal of Computer Applications 1 11 * Feature Selection for Clustering -A Filter Solution ManoranjanDash KiseokChoi PeterScheuermann HuanLiu ICDM'02)0-7695-1754-4/02 © 2002 IEEE Proceedings of the 2002 IEEE International Conference on Data Mining the 2002 IEEE International Conference on Data Mining * multitype features coselection for web document clustering Wei-YingShen Huang Ma 1041-4347/06/$20.00 ieee transactions on knowledge and data engineering 18 4 april 2006. 2006 ieee published by the ieee computer society * Learning a bistochastic data similarity matrix FWang PLi ACKäonig Proceedings of the 10th IEEE International Conference on Data Mining (ICDM) the 10th IEEE International Conference on Data Mining (ICDM) 2010 * Improving information retrieval using document clusters and semantic synonym extraction GBharathi DVengatesan Journal of Theoretical and Applied Information Technology 36 2 2012 * Grouper: A Dynamic Clustering Interface to Web Search Results OZamir OEtzioni Computer Networks 31 1999 * Term-weighting approaches in automatic text retrieval GSalton CBuckley Information Processing & Management 24 5 1998 * NKumar KSrinathan A New Approach for Clustering Variable Length Documents(Proceedings of the Advanced computing Conference IEEE 2009 * A survey paper on concept based text clustering YPrathima KPSupreethi International Journal of Research in IT & Management 1 3 2011 * Learning the parts of objects with nonnegative matrix factorization DDLee HSSeung Nature 401 1999 * Document clustering using nonnegative matrix factorization FShahnaz MWBerry VPPauca RJPlemmons Information Processing and Management 42 2 2006 * Convex and seminonnegative matrix factorizations CDing TLi MIJordan IEEE Transactions on Pattern Analysis and Machine Intelligence 2010 * On the equivalence of nonnegative matrix factorization and spectral clustering CDing XHe HDSimon Proceedings of the 5th SIAM Int'l Conf. Data Mining (SDM) the 5th SIAM Int'l Conf. Data Mining (SDM) 2005 * Relation between plsa and nmf and implications EGaussier CGoutte Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR) the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR) 2005 * Algorithms for nonnegative matrix factorization DDLee HSSeung Advances in Neural Information Processing System (NIPS) 2000 * Projected gradient methods for nonnegative matrix factorization CJLin Neural Computation 19 10 * Efficient streaming text clustering SZhong Neural Networks 18 5-6 2005 * efficient non-negative matrix factorization with random projections FWang PLi Proceedings of the 10th SIAM International Conference on Data Mining (SDM) the 10th SIAM International Conference on Data Mining (SDM) 2010 * One sketch for all: Theory and application of conditional random sampling PLi KWChurch THastie Advances in Neural Information Processing System (NIPS) 2008 * Document clustering in research literature based on NMF and testor theory FLi QZhu Journal of Software 6 1 2011 * Detect and track latent factors with online nonnegative matrix factorization BCao DShen JSun XWang QYang ZChen Proc. International Joint Conference on Artificial Intelligence International Joint Conference on Artificial Intelligence 2007 * Document Clustering Method Using Weighted Semantic Features and Cluster Similarity SunPark Dong Un An Choi Im Cheon Third IEEE International Conference on Digital Game and Intelligent Toy Enhanced Learning 2010. 2010 digitel