# I. INTRODUCTION he problem of relevance search (RS) has been recognized as an important and interesting problem in machine learning and information retrieval, which refers to finding relevant items to a small query set given by the user. Along with the ability to find a cluster of data that shares some common properties to the query, it is also a primary goal of the information retrieval (IR) systems. Typical applications of RS include the discovery of relevant words in a text corpus [1][2][3][4] , answers to community questions [5,6] , features in conceptlearning problems [7] , recommendations in collaborative filtering systems [8,9] , among others. Google Sets [10] is a well-known and successful representative of RS systems that has been widely used. It uses vast amount of web-pages to create a list of related items from a few examples, such as people, movies, words, places, etc. Most of the other RS systems are similar to Google Sets in that they perform some ranking algorithm on a large corpus of documents or web-pages. Ghahramani and Heller [11] proposed the Bayesian Sets algorithm that uses a model-based concept of clusters and performs Bayesian inference to compute the ranking scores of items. Sun et al. [12] used bipartite graphs to model the data and introduced random walks with restarts to rank the items. The RS problem can be viewed from several angles. First, finding relevant items to the query is a standard IR task [11,13] , which may be solved using classic IR algorithms, such as HITS [14] , nearest neighbors, naïve Bayes and Rocchio's algorithm. These algorithms can compute a list of items ordered by the relevance to the query. However, there is no explicit boundary between "relevant" and "irrelevant" items. Second, RS can be interpreted as a special case of semi-supervised learning (SSL) problem with a few positive examples given as the query [11] . It is essentially a one-class classification/clustering problem, which can be solved using one-class learning techniques like mappingconvergence [15] and OC-SVMs [16,17] . The relevant and irrelevant items can be explicitly classified through SSL algorithms. However, most of these algorithms do not provide the rank order of items, which is of importance in IR systems. In this paper, we propose and evaluate a new RS method called bipartite label diffusion (BLD) that can be seen as a hybrid between IR and SSL. BLD is a diffusion-based algorithm that works on bipartite graph. User queries are mapped to vertices with positive labels on the graph. BLD performs local computation at every vertex of the graph and develops a global classifier through the label diffusion process. The diffusion process is often modeled using a Markov chain on the graph. In BLD, we modified the diffusion model by adopting the "label spreading" method [18] to allow negative labels diffuse on the graph. If the word/document is relevant to the queries, the score value is positive, otherwise the value is negative. The more the item is relevant, the higher the score is. Thus BLD can be seen as a semi-supervised learning method. The rest of this paper is organized as follows. We first model the corpus data as document-word bipartite graphs in section 2. The detail of the BLD algorithm is given in section 3. In section 4, we present empirical results to demonstrate the effectiveness of BLD on relevance search problems. Finally, some concluding remarks are given in section 5. # II. BIPARTITE GRAPH MODEL Bipartite graph models have been successfully applied in many fields such as text clustering [19][20][21] , collaborative filtering [22,23] , and content-based image retrieval [24] . The first step of our method is to model the document-word dataset as a bipartite model. We also employ a query expansion scheme to enhance the searching capacity. Detailed description is given in this section. Suppose we are given a set of n documents , } i j i j E d w d w = ? ? D W is the set of edges between two sets of vertices D and W . In the bipartite graph G, there are no edges between words or between documents. An edge e (i, j) exists if word w j appears in document d i . which is denoted by j i w d . And it indicates an association between a word and a document, which may be quantitatively expressed by assigning positive weights on the edges. Assume we are given the user's query set of words Q W ( , Q Q ? ? ? W W W ), or documents Q D ( , Q Q ? ? ? D D W ) , or both of them, the RS problem can be cast to a semi-supervised learning task where the query set contains the initial labeled examples. Fig. 1 shows a simple example of the bipartite graph model. Figure 1: Example of a document-word bipartite graph which has 3 documents and a vocabulary of 4 words. The user's query set is 1 { } Q w = W and 1 { } Q d = D . The goal of this similar sets retrieval problem is to discover a set of words that are similar to Q W , and a set of documents that are similar to Q D There are many edge-weighting schemes in information retrieval research, among them we adopt the popular "tf-idf" weight: ln#( , ) tf-idf ( , ) ln ln #( , ) { : ~ } i j i j i k j k d w n d w d w d w d = ? ? where #( , ) i j d w is the number of occurrence of word w j in document d i . Words with high tf-idf values imply a strong relationship with the document they appear in. We can define an n m × edge weight matrix M: { , tf-idf ( , ), if edge ( , ) exists, 0, otherwise. i j i j i j d w d w M = To interpret the edge weight matrix from the perspective of Markov chains on graph, which will be described in section 3, we define the transition probability from d i to w j as: , , ,1m i j i j i p p P M M = = ?(1) And the transition probability from w j to d i is given by, , , , 1 n j i i j p j p Q M M = = ?(2) The Markov transition matrix P normalizes M such that every row sum up to 1, and Q normalizes M T such that every row sum up to 1. # Thus the ( ) ( ) m n m n + × + adjacency matrix A of the bipartite graph may be written as: ? ? = ? ? ? ? 0 Q A P 0 , Where the vertices of the graph is ordered such that the first m vertices index the words and the last n vertices index the documents. Note that bipartite graph can also be a good model in the scenario of collaborative filtering or recommendation systems. A standard collaborative filtering problem often involves a user set U , an item set I , and a vote set V . In a similar manner, we can build a bipartite graph ( , , ) E = G U I . The normalized vote scores of V can directly serve as the weight matrix. Moreover, the aim of collaborative filtering is to discover new items that an active user may like based on her voted items, which could be easily translated to a RS problem by finding similar items to the user's favorite ones (items with high voting scores). # III. # LABEL DIFFUSION ON BIPARTITE GRAPHS In general, there are two primary approaches to cluster vertices on a bipartite graph: one is to partition the graph to disjoint parts corresponding to different clusters [25,26] ; the other is to compute a rank value for each vertex indicating the probability that the vertex is in a cluster [24,27] . To capture the uncertainties on datacluster assignments, we investigate the problem from the latter angle. # a) Markov Chains on the Graph In a bipartite graph, there are no direct relationships among the words or among the documents. However, in diffusion-based methods [18,27] , the strength of the similarities among elements on the same side of the bipartite graph may be captured by a local probability evolution process, which can be integrated to obtain a global view of the data. We define a Markov chain to describe the diffusion process over the bipartite graph. Assume there is a discrete time random walk on the bipolar graph G, the row-normalized adjacency matrix A is the one-step transition matrix that , i j A is the probability of moving to the jth vertex of v given that the current step is at the ith vertex. More specifically, P is the document-word transition matrix containing the transition probabilities of w 1 d 2 d 3 w 2 w 3 w 4 W ( D D D D ) D2012 Year moving from a document vertex to a word vertex, and Q is the word-document transition matrix. Since the document-word graph is bipartite, all the paths from w i to w j must go through vertices in D . As is shown in [15, 18], the similarity of two words w i and w j , is a quantity proportional to the probability of direct transitions between them, denoted by p(w i , w j ), ( , ) ( ) ( | ) = = ? ? Obviously, it is a 2-step stochastic process that first maps w i to the document set, and then maps the documents back to w j . Similarly, the conditional transition probability from d i to d j is given by, , ,( | ) # ? ? Therefore, by formulating the relationship between documents and words as Markov chains on the bipartite graph, we have the word-word transition probability matrix QP , and the document-document transition probability matrix PQ . # b) Diffusion Process Intuitively, our bipolar diffusion framework works as following: First, we construct a bipartite graph with two poles as is described in section 2.3. The heat pole stands for the words and documents that are most relevant to the query, while the cold pole contains the "strongest negative" words and documents. Then a certain amount of heat is injected to the graph through the heat pole, and the cold polar extracts the heat out of the system as a heat sink. And the heat diffuses through the edges of the graph. Since the system has two poles, we name the heat diffusion as the "bipolar diffusion process". The diffusion process may be thought of as a Markov chain on the graph. The fundamental property of a Markov chain is the Markov property, which make it possible to predict the future state of a system from its present state ignoring its past states. We denote p (t) and q (t) as the labels of documents and words at time t. The Markov process then defines a dynamic system, ( 1) ( ) ( ) t t t + = ? = Q QP p q p (3) ( 1) ( ) ( ) t t t + =? = P PQ q p q (4) This simple 2-step diffusion process captures the interaction between the two sets of vertices on the graph. It requires p (t) and q (t) be the probability distributions of Markov states with non-negative values. However, in our bipolar graph diffusion framework, the vertices of the cold pole are labeled by negative values. In the following, we cast the diffusion process as a semisupervised learning problem, which makes it possible to diffuse heat and cold simultaneously on the graph. First, we use an (n+2)-vector p (0) to denote the initial labels of documents, ? (0) {1, 0,...0, 1} T n = ? p ; and an (m+2)-vector q (0) to denote the initial labels of words, ? (0) {1, 0,...0, 1} T m = ? q . The virtual vertices of heat pole are labeled by positive values, which allow heat diffuses through the graph; while negative values representing "cold" are assigned to the virtual vertices of cold pole. And the vertices keep exchanging these two kinds of energy as the diffusion process proceeds. Then we adopt the "label spreading" method [18] to allow negative labels diffuse on the graph. The iteration in Eq. ( 3) and Eq. ( 4) can be rewritten as, ( 1) (0) ( ) (1)t t ? ? + = + ? ? Q p p q ( 1) (0) ( ) (1)t t ? ? + = + ? ? P q q p Where ? and ? are scaling parameters both in (0,1), which specify the relative amount of the heat/cold a vertex received from its neighbors and its initial label information. To simplify the diffusion process, we set 1 2 ? ? = = . The iteration equations indicate that when 1 t ? (5) (6) Where and . Since P ? and Q ? are row-normalized matrices, Eq. ( 5) follows that when t ? ? , ( 1) t+ p converges to, And Eq. ( 6) converges to, © 2012 Global Journals Inc. (US) # Global Journal of Computer Science and Technology Volume XII Issue X Version I 21 ( D D D D ) D ( 1) (0) ( ) (0) (0)( 1) 1 (0)(0) (0) 0 1 1 1 1 () 2 2 4 4 11 ( )( ) 24 t t t t it i ?? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Q Q QP ? Q ? p p q p q p p q p ( 1)(0) ( ) (0) (0) ( 1) 1 (0)(0) (0) 0 1 1 1 1 () 2 2 4 4 11 ( )( ) 24 t t t t it i ?? ? ? ? ? ? ? ? ? ? ? ? ? ? ? P P PQ Î?" P Î?" q q p q p q q p q 1 4 ? ? QP ? ? 1 4 ? Î?" PQ ? ? ( ) 1 (0) (0) 11 ( ) ( ) 24 ?? ? ? ? I ? Q p p q ( ) 1 (0)(0) 11 ( ) ( ) 24 ??? ? ? I Î?" P q q p IV. EXPERIMENTS Our proposed BLD method is applicable to discover similar items to a query set from a corpus of text data or user rating data. In this section, we experiment with our model on a set of RS problems. The experiments are performed on two standard text datasets: Reuters-21578 1 , and a collaborative filtering dataset: MovieLens 2 # Datasets . All of the text datasets were preprocessed by removing the stopwords and stemming. And for Reuters-21578, we use a subset of the ten most frequent categories with highest number of positive examples, namely Reuters-10. The main features of the datasets are summarized in Table 1. We conducted two experiments on both the text dataset Reuter-10 and the movie rating dataset MovieLens to evaluate the RS ability of our method. The results and comparisons with Google Sets 3 , Internet movie database (IMDB) 4 query: science + technology , and Bayesian Sets are given in tables 2, 3 and 4. Concerning the application of movie recommending, the query takes the form of a set of movies. We regard movies as documents and users as words, and the rating scores are equivalent to word frequencies. Thus our algorithm can be easily adapted to collaborative filtering datasets. Among the tested algorithms and systems, Google Sets is a baseline RS system that is based on vast amount of web data. Although the Google Sets algorithm is not available for us to run it on our datasets, it is still informative and worth to be compared with; IMDB provides movie recommendations by generating a list of 5 movies most related to the query, which is based on collaborative filtering technology; Bayesian Sets views RS as a Bayesian inference problem and gives the corresponding ranking algorithm. We experiment with an online Bayesian Sets recommending system on Movie Lens dataset. From the query results of the relevant words discovery task and the movie suggestion task, we can make the following comments: 1. Table 2 shows that both Google Sets and our method can achieve to some extent similar sets to the query. There is not an objective standard to tell the exact similarity between words or movies. However, the words that our method retrieved are obviously more sensible than some results of Google Sets, e.g. "war" and "Intel" for the query of "market" and "price". We think this is because Google Sets and our method have different learning mechanisms and are based on different corpuses. 2. Our method serves as a good algorithm for recommending systems on the MovieLens dataset. The recommended movies to the query "Full Metal Jacket" are all related to war. And for "The Graduate", most of the results are romance and drama movies. We notice that IMDB's suggestions are often popular and new, while Bayesian Sets and our method, limited by the MovieLens dataset, tend to generate classic movies. # V. CONCLUSION In this paper we developed a new graph diffusion algorithm for RS. We used bipartite graphs to model the relationships between documents and words. We also modeled the diffusion process using a Markov chain on the graph, and presented the corresponding label diffusion algorithm. In future work, we will extend the proposed method to other applications (e.g. social networks, question answering systems). VI. 1![,..., } n d d = D , which contain a set of m words 1 { ,..., } m w w = W, the document-word collection can be modeled as a undirected bipartite graph (](image-2.png "1 {") I###nonzerodocuments/useswords/itemsentriesReuters-109,9895,180373,440MovieLens9431,682100,000 IISets and Bld Based on the Same Given Queries. BldRuns on Reuters-10query: market + priceGoogle SetsBLDGoogle SetsBLDsciencesciencmarketmarkettechnologytechnologipricepricbusinessuniversoverviewmoneysportsengineerviewstockhealtheducatriskvaluentertainment researchgainsbondeducationcomputforecastsbusipoliticslifewarcompanitraveldevelopInteltradecomputersculturlosseseconomic IIIQuery: Full Metal Jacket (1987)Google SetsIMDBBayesian SetsBLDSaving Private Ryan (1998)Platoon (1986)The Terminator (1984)Platoon (1986)Apocalypse Now (1979)All Quiet on the Western Front (1930)Star Episode (1980)Wars VRambo:First Blood (1982)Platoon (1986)Cidade Deus (2002)deRaiders of the Lost Ark (1981)Braveheart (1995)Pulp Fiction (1994)Batoru Rowaiaru (2000)Aliens (1986)Apocalypse Now (1979)Hamburger Hill (1987)If? (1968)Die (1988)HardStar Wars Episode V (1980) IVQuery: The Graduate (1967)Google SetsIMDBBayesian SetsBLDChinatownMysteriousCasablancaCasablanca(1974)Skin (2004)(1942)(1942)Midnight Cowboy (1969)Giant (1956)The Wizard of Oz (1939)Annie (1977)HallAnnie Hall (1977)The Notebook (2004)One over Nest (1975) Flew the Cuckoo'sGone with the Wind (1939)Taxi Driver (1976)Bigfish (2003)The Godfather (1972)To Mockingbird Kill (1962)aBonnie and Clyde (1967)Notes on a Scandal (2006)Amadeus (1984)Giant (1956) http://www.daviddlewis.com/resources/testcollections/reuters21578/ 2 http://www.grouplens.org/system/files/ml-data.tar__0.gz 3 http://labs.google.com/sets 4 http://www.imdb.com/ ## ACKNOWLEDGMENT * A measure of similarity between graph vertices: Applications to synonym extraction and web searching VDBlondel AGajardo MHeymans PSenellart PVDooren Siam Review 46 4 2004 * A generative theory of relevance VLavrenko 2004 University of Massachusetts Amherst * Automatic retrieval and clustering of similar words DLin Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics (COLING/ACL-98 the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics (COLING/ACL-98 1998 * Languageindependent set expansion of named entities using the web RCWang WWCohen Proceedings of the 17th IEEE International Conference on Data Mining the 17th IEEE International Conference on Data Mining 2007 * A generalized Co-HITS algorithm and its application to bipartite graphs HDeng MRLyu IKing Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining the 15th ACM SIGKDD international conference on Knowledge discovery and data mining 2009 * Probabilistic relevance ranking for collaborative filtering JWang SRobertson APVries MJReinders Inf. Retr 11 6 2008 * Automatically extracting features for concept learning from the web WWCohen Proceedings of the 17th International Conference on Machine Learning the 17th International Conference on Machine Learning 2000 * Recommendation as classification: using social and content-based information in recommendation CBasu HHirsh WCohen Proceedings of the 15th national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence the 15th national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence 1998 * Probabilistic relaxation labelling using the Fokker-Planck equation HWang ERHancock Pattern Recogn 41 11 2008 * GoogleGoogle Sets * Bayesian sets ZGhahramani KAHeller Advances in Neural Information Processing Systems 2005 18 * Relevance search and anomaly detection in bipartite graphs JSun HQu DChakrabarti CFaloutsos SIGKDD Explor. Newsl 7 2 2005 * Probabilistic relevance models based on document and query generation. In Language Modeling and Information Retrieval JLafferty CZhai Kluwer International Series on Information Retrieval 2002 * Authoritative sources in a hyperlinked environment JMKleinberg J. ACM 46 5 1999 * PEBL: Web Page Classification without Negative Examples HYu JHan KCChang .-C IEEE Trans. on Knowl. and Data Eng 16 1 2004 * One-class svms for document classification LMManevitz MYousef J. Mach. Learn. Res 2 2002 * Estimating the Support of a High-Dimensional Distribution BSchölkopf JCPlatt JCShawe-Taylor AJSmola RCWilliamson Neural Comput 13 7 2001 * Learning with local and global consistency DZhou OBousquet TNLal JWeston BSchölkopf Advances in Neural Information Processing Systems 2004 16 * Co-clustering documents and words using bipartite spectral graph partitioning ISDhillon Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining the 17th ACM SIGKDD international conference on Knowledge discovery and data mining 2001 * Preserving Patterns in Bipartite Graph Partitioning THu CQu CLTan SYSung WZhou Proceedings of the 18th IEEE International Conference on Tools with Artificial Intelligence the 18th IEEE International Conference on Tools with Artificial Intelligence 2006 * Document-Word Co-regularization for Semi-supervised Sentiment Analysis VSindhwani PMelville Proceedings of the the 2008. 2008 * Eighth IEEE International Conference on Data Mining * Learning spectral graph transformations for link prediction JKunegis ALommatzsch Proceedings of the 26th Annual International Conference on Machine Learning the 26th Annual International Conference on Machine Learning 2009 * Collaborative knowledge semantic graph image search J.-RShieh Y.-TYeh C.-HLin C.-YLin J.-LWu Proceedings of the 17th international conference on World Wide Web the 17th international conference on World Wide Web 2008 * Distance-free image retrieval based on stochastic diffusion over bipartite graphs CBauckhage Proceedings of the 18th British Machine Vision Conference the 18th British Machine Vision Conference 2007 * Solving cluster ensemble problems by bipartite graph partitioning XZFern CEBrodley Proceedings of the 21st International Conference on Machine learning (ICML04) the 21st International Conference on Machine learning (ICML04) 2004 36 * Bipartite isoperimetric graph partitioning for data coclustering MRege MDong FFotouhi Data Min. Knowl. Discov 16 3 2008 * Soft clustering on graphs KYu SYu VTresp Advances in Neural Information Processing Systems 18 2005