# I. INTRODUCTION

he problem of relevance search (RS) has been recognized as an important and interesting problem in machine learning and information retrieval, which refers to finding relevant items to a small query set given by the user. Along with the ability to find a cluster of data that shares some common properties to the query, it is also a primary goal of the information retrieval (IR) systems. Typical applications of RS include the discovery of relevant words in a text corpus [1][2][3][4] , answers to community questions [5,6] , features in conceptlearning problems [7] , recommendations in collaborative filtering systems [8,9] , among others.

Google Sets [10] is a well-known and successful representative of RS systems that has been widely used. It uses vast amount of web-pages to create a list of related items from a few examples, such as people, movies, words, places, etc. Most of the other RS systems are similar to Google Sets in that they perform some ranking algorithm on a large corpus of documents or web-pages. Ghahramani and Heller [11] proposed the Bayesian Sets algorithm that uses a model-based concept of clusters and performs Bayesian inference to compute the ranking scores of items. Sun et al. [12] used bipartite graphs to model the data and introduced random walks with restarts to rank the items.

The RS problem can be viewed from several angles. First, finding relevant items to the query is a standard IR task [11,13] , which may be solved using classic IR algorithms, such as HITS [14] , nearest neighbors, naïve Bayes and Rocchio's algorithm. These algorithms can compute a list of items ordered by the relevance to the query. However, there is no explicit boundary between "relevant" and "irrelevant" items. Second, RS can be interpreted as a special case of semi-supervised learning (SSL) problem with a few positive examples given as the query [11] .

It is essentially a one-class classification/clustering problem, which can be solved using one-class learning techniques like mappingconvergence [15] and OC-SVMs [16,17] . The relevant and irrelevant items can be explicitly classified through SSL algorithms. However, most of these algorithms do not provide the rank order of items, which is of importance in IR systems.

In this paper, we propose and evaluate a new RS method called bipartite label diffusion (BLD) that can be seen as a hybrid between IR and SSL. BLD is a diffusion-based algorithm that works on bipartite graph. User queries are mapped to vertices with positive labels on the graph. BLD performs local computation at every vertex of the graph and develops a global classifier through the label diffusion process. The diffusion process is often modeled using a Markov chain on the graph. In BLD, we modified the diffusion model by adopting the "label spreading" method [18] to allow negative labels diffuse on the graph. If the word/document is relevant to the queries, the score value is positive, otherwise the value is negative. The more the item is relevant, the higher the score is. Thus BLD can be seen as a semi-supervised learning method.

The rest of this paper is organized as follows. We first model the corpus data as document-word bipartite graphs in section 2. The detail of the BLD algorithm is given in section 3. In section 4, we present empirical results to demonstrate the effectiveness of BLD on relevance search problems. Finally, some concluding remarks are given in section 5.


# II. BIPARTITE GRAPH MODEL

Bipartite graph models have been successfully applied in many fields such as text clustering [19][20][21] , collaborative filtering [22,23] , and content-based image retrieval [24] . The first step of our method is to model the document-word dataset as a bipartite model. We also employ a query expansion scheme to enhance the searching capacity. Detailed description is given in this section.

Suppose we are given a set of n documents , }
i j i j E d w d w = ? ? D
W is the set of edges between two sets of vertices D and W . In the bipartite graph G, there are no edges between words or between documents. An edge e (i, j) exists if word w j appears in document d i . which is denoted by j i w d . And it indicates an association between a word and a document, which may be quantitatively expressed by assigning positive weights on the edges. Assume we are given the user's query set of words
Q W ( , Q Q ? ? ? W W W ), or documents Q D ( , Q Q ? ? ? D D W )
, or both of them, the RS problem can be cast to a semi-supervised learning task where the query set contains the initial labeled examples. Fig. 1 shows a simple example of the bipartite graph model.

Figure 1: Example of a document-word bipartite graph which has 3 documents and a vocabulary of 4 words. The user's query set is
1 { } Q w = W and 1 { } Q d = D
. The goal of this similar sets retrieval problem is to discover a set of words that are similar to Q W , and a set of documents that are similar to Q D There are many edge-weighting schemes in information retrieval research, among them we adopt the popular "tf-idf" weight:
ln#( , ) tf-idf ( , ) ln ln #( , ) { : ~ } i j i j i k j k d w n d w d w d w d = ? ? where #( , )
i j d w is the number of occurrence of word w j in document d i . Words with high tf-idf values imply a strong relationship with the document they appear in.

We can define an n m × edge weight matrix M: { , tf-idf ( , ), if edge ( , ) exists, 0, otherwise.
i j i j i j d w d w M =
To interpret the edge weight matrix from the perspective of Markov chains on graph, which will be described in section 3, we define the transition probability from d i to w j as:
, , ,1m i j i j i p p P M M = = ?(1)
And the transition probability from w j to d i is given by, , , , 1
n j i i j p j p Q M M = = ?(2)
The Markov transition matrix P normalizes M such that every row sum up to 1, and Q normalizes M T such that every row sum up to 1.


# Thus the (

) ( ) m n m n + × + adjacency matrix A of the bipartite graph may be written as:
? ? = ? ? ? ? 0 Q A P 0 ,
Where the vertices of the graph is ordered such that the first m vertices index the words and the last n vertices index the documents.

Note that bipartite graph can also be a good model in the scenario of collaborative filtering or recommendation systems. A standard collaborative filtering problem often involves a user set U , an item set I , and a vote set V . In a similar manner, we can build a bipartite graph ( , , )
E = G U I
. The normalized vote scores of V can directly serve as the weight matrix. Moreover, the aim of collaborative filtering is to discover new items that an active user may like based on her voted items, which could be easily translated to a RS problem by finding similar items to the user's favorite ones (items with high voting scores).


# III.


# LABEL DIFFUSION ON BIPARTITE GRAPHS

In general, there are two primary approaches to cluster vertices on a bipartite graph: one is to partition the graph to disjoint parts corresponding to different clusters [25,26] ; the other is to compute a rank value for each vertex indicating the probability that the vertex is in a cluster [24,27] . To capture the uncertainties on datacluster assignments, we investigate the problem from the latter angle.


# a) Markov Chains on the Graph

In a bipartite graph, there are no direct relationships among the words or among the documents. However, in diffusion-based methods [18,27] , the strength of the similarities among elements on the same side of the bipartite graph may be captured by a local probability evolution process, which can be integrated to obtain a global view of the data.

We define a Markov chain to describe the diffusion process over the bipartite graph. Assume there is a discrete time random walk on the bipolar graph G, the row-normalized adjacency matrix A is the one-step transition matrix that , i j A is the probability of moving to the jth vertex of v given that the current step is at the ith vertex. More specifically, P is the document-word transition matrix containing the transition probabilities of
w 1 d 2 d 3 w 2 w 3 w 4 W ( D D D D ) D2012
Year moving from a document vertex to a word vertex, and Q is the word-document transition matrix. Since the document-word graph is bipartite, all the paths from w i to w j must go through vertices in D . As is shown in [15, 18], the similarity of two words w i and w j , is a quantity proportional to the probability of direct transitions between them, denoted by p(w i , w j ), ( , ) ( ) ( | ) 
= = ? ?
Obviously, it is a 2-step stochastic process that first maps w i to the document set, and then maps the documents back to w j .

Similarly, the conditional transition probability from d i to d j is given by, 
, ,( | )

# ? ?

Therefore, by formulating the relationship between documents and words as Markov chains on the bipartite graph, we have the word-word transition probability matrix QP , and the document-document transition probability matrix PQ .


# b) Diffusion Process

Intuitively, our bipolar diffusion framework works as following: First, we construct a bipartite graph with two poles as is described in section 2.3. The heat pole stands for the words and documents that are most relevant to the query, while the cold pole contains the "strongest negative" words and documents. Then a certain amount of heat is injected to the graph through the heat pole, and the cold polar extracts the heat out of the system as a heat sink. And the heat diffuses through the edges of the graph. Since the system has two poles, we name the heat diffusion as the "bipolar diffusion process".

The diffusion process may be thought of as a Markov chain on the graph. The fundamental property of a Markov chain is the Markov property, which make it possible to predict the future state of a system from its present state ignoring its past states. We denote p (t) and q (t) as the labels of documents and words at time t. The Markov process then defines a dynamic system,
( 1) ( ) ( ) t t t + = ? = Q QP p q p (3) ( 1) ( ) ( ) t t t + =? = P PQ q p q (4)
This simple 2-step diffusion process captures the interaction between the two sets of vertices on the graph. It requires p (t) and q (t) be the probability distributions of Markov states with non-negative values.

However, in our bipolar graph diffusion framework, the vertices of the cold pole are labeled by negative values. In the following, we cast the diffusion process as a semisupervised learning problem, which makes it possible to diffuse heat and cold simultaneously on the graph. First, we use an (n+2)-vector p (0) to denote the initial labels of documents, ? (0) {1, 0,...0, 1} T n = ? p ; and an (m+2)-vector q (0) to denote the initial labels of words,
? (0) {1, 0,...0, 1} T m = ? q
. The virtual vertices of heat pole are labeled by positive values, which allow heat diffuses through the graph; while negative values representing "cold" are assigned to the virtual vertices of cold pole.

And the vertices keep exchanging these two kinds of energy as the diffusion process proceeds.

Then we adopt the "label spreading" method [18] to allow negative labels diffuse on the graph. The iteration in Eq. ( 3) and Eq. ( 4) can be rewritten as,
( 1) (0) ( ) (1)t t ? ? + = + ? ? Q p p q ( 1) (0) ( ) (1)t t ? ? + = + ? ? P q q p
Where ? and ? are scaling parameters both in (0,1), which specify the relative amount of the heat/cold a vertex received from its neighbors and its initial label information. To simplify the diffusion process, we set 1 2

? ? = = . The iteration equations indicate that when
1 t ? (5) (6)
Where and . Since P ? and Q ? are row-normalized matrices, Eq. ( 5) follows that when t ? ? , ( 1) t+ p converges to,

And Eq. ( 6) converges to,

© 2012 Global Journals Inc. (US)


# Global Journal of Computer Science and Technology

Volume XII Issue X Version I
21 ( D D D D ) D ( 1) (0) ( ) (0) (0)( 1) 1 (0)(0) (0) 0 1 1 1 1 () 2 2 4 4 11 ( )( ) 24 t t t t it i ?? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Q Q QP ? Q ? p p q p q p p q p ( 1)(0) ( ) (0) (0) ( 1) 1 (0)(0) (0) 0 1 1 1 1 () 2 2 4 4 11 ( )( ) 24 t t t t it i ?? ? ? ? ? ? ? ? ? ? ? ? ? ? ? P P PQ Î?" P Î?" q q p q p q q p q 1 4 ? ? QP ? ? 1 4 ? Î?" PQ ? ? ( ) 1 (0) (0) 11 ( ) ( ) 24 ?? ? ? ? I ? Q p p q ( ) 1 (0)(0) 11 ( ) ( ) 24 ??? ? ? I Î?" P q q p IV. EXPERIMENTS
Our proposed BLD method is applicable to discover similar items to a query set from a corpus of text data or user rating data. In this section, we experiment with our model on a set of RS problems.

The experiments are performed on two standard text datasets: Reuters-21578 1 , and a collaborative filtering dataset: MovieLens 2


# Datasets

. All of the text datasets were preprocessed by removing the stopwords and stemming. And for Reuters-21578, we use a subset of the ten most frequent categories with highest number of positive examples, namely Reuters-10. The main features of the datasets are summarized in Table 1. We conducted two experiments on both the text dataset Reuter-10 and the movie rating dataset MovieLens to evaluate the RS ability of our method. The results and comparisons with Google Sets 3 , Internet movie database (IMDB) 4 query: science + technology , and Bayesian Sets are given in tables 2, 3 and 4. Concerning the application of movie recommending, the query takes the form of a set of movies. We regard movies as documents and users as words, and the rating scores are equivalent to word frequencies. Thus our algorithm can be easily adapted to collaborative filtering datasets.  Among the tested algorithms and systems, Google Sets is a baseline RS system that is based on vast amount of web data. Although the Google Sets algorithm is not available for us to run it on our datasets, it is still informative and worth to be compared with; IMDB provides movie recommendations by generating a list of 5 movies most related to the query, which is based on collaborative filtering technology; Bayesian Sets views RS as a Bayesian inference problem and gives the corresponding ranking algorithm. We experiment with an online Bayesian Sets recommending system on Movie Lens dataset.

From the query results of the relevant words discovery task and the movie suggestion task, we can make the following comments: 1. Table 2 shows that both Google Sets and our method can achieve to some extent similar sets to the query. There is not an objective standard to tell the exact similarity between words or movies. However, the words that our method retrieved are obviously more sensible than some results of Google Sets, e.g. "war" and "Intel" for the query of "market" and "price". We think this is because Google Sets and our method have different learning mechanisms and are based on different corpuses. 2. Our method serves as a good algorithm for recommending systems on the MovieLens dataset.

The recommended movies to the query "Full Metal Jacket" are all related to war. And for "The Graduate", most of the results are romance and drama movies. We notice that IMDB's suggestions are often popular and new, while Bayesian Sets and our method, limited by the MovieLens dataset, tend to generate classic movies.


# V. CONCLUSION

In this paper we developed a new graph diffusion algorithm for RS. We used bipartite graphs to model the relationships between documents and words. We also modeled the diffusion process using a Markov chain on the graph, and presented the corresponding label diffusion algorithm. In future work, we will extend the proposed method to other applications (e.g. social networks, question answering systems).

VI.
1![,..., } n d d = D , which contain a set of m words 1 { ,..., } m w w = W, the document-word collection can be modeled as a undirected bipartite graph (](image-2.png "1 {")
I###nonzerodocuments/useswords/itemsentriesReuters-109,9895,180373,440MovieLens9431,682100,000
IISets and Bld Based on the Same Given Queries. BldRuns on Reuters-10query: market + priceGoogle SetsBLDGoogle SetsBLDsciencesciencmarketmarkettechnologytechnologipricepricbusinessuniversoverviewmoneysportsengineerviewstockhealtheducatriskvaluentertainment researchgainsbondeducationcomputforecastsbusipoliticslifewarcompanitraveldevelopInteltradecomputersculturlosseseconomic
IIIQuery: Full Metal Jacket (1987)Google SetsIMDBBayesian SetsBLDSaving Private Ryan (1998)Platoon (1986)The Terminator (1984)Platoon (1986)Apocalypse Now (1979)All Quiet on the Western Front (1930)Star Episode (1980)Wars VRambo:First Blood (1982)Platoon (1986)Cidade Deus (2002)deRaiders of the Lost Ark (1981)Braveheart (1995)Pulp Fiction (1994)Batoru Rowaiaru (2000)Aliens (1986)Apocalypse Now (1979)Hamburger Hill (1987)If? (1968)Die (1988)HardStar Wars Episode V (1980)
IVQuery: The Graduate (1967)Google SetsIMDBBayesian SetsBLDChinatownMysteriousCasablancaCasablanca(1974)Skin (2004)(1942)(1942)Midnight Cowboy (1969)Giant (1956)The Wizard of Oz (1939)Annie (1977)HallAnnie Hall (1977)The Notebook (2004)One over Nest (1975) Flew the Cuckoo'sGone with the Wind (1939)Taxi Driver (1976)Bigfish (2003)The Godfather (1972)To Mockingbird Kill (1962)aBonnie and Clyde (1967)Notes on a Scandal (2006)Amadeus (1984)Giant (1956)
			http://www.daviddlewis.com/resources/testcollections/reuters21578/ 2 http://www.grouplens.org/system/files/ml-data.tar__0.gz 3 http://labs.google.com/sets 4 http://www.imdb.com/
		
		
## ACKNOWLEDGMENT

			
* 
	
		A measure of similarity between graph vertices: Applications to synonym extraction and web searching
		
			VDBlondel
		
		
			AGajardo
		
		
			MHeymans
		
		
			PSenellart
		
		
			PVDooren
		
	
		Siam Review
		
			46
			4
			
			2004
		
	
* 
	
		A generative theory of relevance
		
			VLavrenko
		
		
			2004
		
		
			University of Massachusetts Amherst
		
	
* 
	
		Automatic retrieval and clustering of similar words
		
			DLin
		
	
		Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics (COLING/ACL-98
				the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics (COLING/ACL-98
		
			1998
			
		
* 
	
		Languageindependent set expansion of named entities using the web
		
			RCWang
		
		
			WWCohen
		
	
		Proceedings of the 17th IEEE International Conference on Data Mining
				the 17th IEEE International Conference on Data Mining
		
			2007
			
		
* 
	
		A generalized Co-HITS algorithm and its application to bipartite graphs
		
			HDeng
		
		
			MRLyu
		
		
			IKing
		
	
		Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
				the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
		
			2009
			
		
* 
	
		Probabilistic relevance ranking for collaborative filtering
		
			JWang
		
		
			SRobertson
		
		
			APVries
		
		
			MJReinders
		
	
		Inf. Retr
		
			11
			6
			
			2008
		
	
* 
	
		Automatically extracting features for concept learning from the web
		
			WWCohen
		
	
		Proceedings of the 17th International Conference on Machine Learning
				the 17th International Conference on Machine Learning
		
			2000
			
		
* 
	
		Recommendation as classification: using social and content-based information in recommendation
		
			CBasu
		
		
			HHirsh
		
		
			WCohen
		
	
		Proceedings of the 15th national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
				the 15th national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
		
			1998
			
		
* 
	
		Probabilistic relaxation labelling using the Fokker-Planck equation
		
			HWang
		
		
			ERHancock
		
	
		Pattern Recogn
		
			41
			11
			
			2008
		
	
* 
	
		
			GoogleGoogle
		
		
			Sets
		
		
* 
	
		Bayesian sets
		
			ZGhahramani
		
		
			KAHeller
		
	
		Advances in Neural Information Processing Systems
				
			2005
			18
		
	
* 
	
		Relevance search and anomaly detection in bipartite graphs
		
			JSun
		
		
			HQu
		
		
			DChakrabarti
		
		
			CFaloutsos
		
	
		SIGKDD Explor. Newsl
		
			7
			2
			
			2005
		
	
* 
	
		Probabilistic relevance models based on document and query generation. In Language Modeling and Information Retrieval
		
			JLafferty
		
		
			CZhai
		
	
		Kluwer International Series on Information Retrieval
		
			2002
		
	
* 
	
		Authoritative sources in a hyperlinked environment
		
			JMKleinberg
		
	
		J. ACM
		
			46
			5
			
			1999
		
	
* 
	
		PEBL: Web Page Classification without Negative Examples
		
			HYu
		
		
			JHan
		
		
			KCChang
		
		
			.-C
		
	
		IEEE Trans. on Knowl. and Data Eng
		
			16
			1
			
			2004
		
	
* 
	
		One-class svms for document classification
		
			LMManevitz
		
		
			MYousef
		
	
		J. Mach. Learn. Res
		
			2
			
			2002
		
	
* 
	
		Estimating the Support of a High-Dimensional Distribution
		
			BSchölkopf
		
		
			JCPlatt
		
		
			JCShawe-Taylor
		
		
			AJSmola
		
		
			RCWilliamson
		
	
		Neural Comput
		
			13
			7
			
			2001
		
	
* 
	
		Learning with local and global consistency
		
			DZhou
		
		
			OBousquet
		
		
			TNLal
		
		
			JWeston
		
		
			BSchölkopf
		
	
		Advances in Neural Information Processing Systems
				
			2004
			16
			
		
* 
	
		Co-clustering documents and words using bipartite spectral graph partitioning
		
			ISDhillon
		
	
		Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
				the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
		
			2001
			
		
* 
	
		Preserving Patterns in Bipartite Graph Partitioning
		
			THu
		
		
			CQu
		
		
			CLTan
		
		
			SYSung
		
		
			WZhou
		
	
		Proceedings of the 18th IEEE International Conference on Tools with Artificial Intelligence
				the 18th IEEE International Conference on Tools with Artificial Intelligence
		
			2006
			
		
* 
	
		Document-Word Co-regularization for Semi-supervised Sentiment Analysis
		
			VSindhwani
		
		
			PMelville
		
	
		Proceedings of the
				the
		
			2008. 2008
		
	
* 
	
		Eighth IEEE International Conference on Data Mining
				
			
* 
	
		Learning spectral graph transformations for link prediction
		
			JKunegis
		
		
			ALommatzsch
		
	
		Proceedings of the 26th Annual International Conference on Machine Learning
				the 26th Annual International Conference on Machine Learning
		
			2009
			
		
* 
	
		Collaborative knowledge semantic graph image search
		
			J.-RShieh
		
		
			Y.-TYeh
		
		
			C.-HLin
		
		
			C.-YLin
		
		
			J.-LWu
		
	
		Proceedings of the 17th international conference on World Wide Web
				the 17th international conference on World Wide Web
		
			2008
			
		
* 
	
		Distance-free image retrieval based on stochastic diffusion over bipartite graphs
		
			CBauckhage
		
	
		Proceedings of the 18th British Machine Vision Conference
				the 18th British Machine Vision Conference
		
			2007
			
		
* 
	
		Solving cluster ensemble problems by bipartite graph partitioning
		
			XZFern
		
		
			CEBrodley
		
	
		Proceedings of the 21st International Conference on Machine learning (ICML04)
				the 21st International Conference on Machine learning (ICML04)
		
			2004
			36
		
	
* 
	
		Bipartite isoperimetric graph partitioning for data coclustering
		
			MRege
		
		
			MDong
		
		
			FFotouhi
		
	
		Data Min. Knowl. Discov
		
			16
			3
			
			2008
		
	
* 
	
		Soft clustering on graphs
		
			KYu
		
		
			SYu
		
		
			VTresp
		
	
		Advances in Neural Information Processing Systems 18
				
			2005