Introduction n the modern information retrieval system, the results that are found should be more accurate to query submitted by the user, and also efficiency should be considered.

In order to solve the problems that are faced by the current search engine technology such as retrieving results that are irrelevant to the search query, the order in which they are displayed should be considered. According to Hele-Mai Haav [1] to solve problem of information retrieval in current information retrieval systems it should be improved by intelligence to manage the effective retrieval, filtering and presenting relevant information. So two main information retrieval models are classified as, keyword based information retrieval model and concept based information retrieval model. The indexing terms and Boolean logical queries are used in keyword based model, where indexing may be automatic or manual, when Boolean query are taken into consideration the frequency of occurrence is taken into account.

Context-aware system [2], depending on the user's relevancy the information/services is provided. For instance consider the keyword apple, it can mean as a fruit or it can mean as a mobile and laptops by Apple Company. When the query is submitted by two different users, irrespective of their interest same results are displayed for both users, if one user is interested only on apple accessories, for him both relevant and irrelevant information are displayed in random order. The information for what the user is looking may be in same document else somewhere in the overall document. The current system performs word to word matching of the search query.

Another instance in search engine is searching for places based on current location of the user. For example, if the user current location is Jaynagar and user trying to search restaurant near by current location, the search engine must show the restaurant which are near to the current location of the users and rest of the restaurant location other than jaynagar should be given next preference. The detailed discussion related to geographic and non-geographic search is given in proposed system section.

The main aspects that should be considered in information retrieval system is to reduce the complexity involved in query execution [3] such that performing lexical analysis, stemming process on the user query and construction of index terms. This paper focuses on search engine optimization (SEO) by reducing the complexity in the user query execution.

The rest of the paper is organized as: -In section II literature survey is carried out by surveying previous paper present, such that what are the technologies currently used to optimize the search engine. In section III technique to reduce the complexity for optimization of search query are studied. In section IV detailed view of implementation. In section V experimental evaluation and in IV Conclusion and future enhancements are discussed.


# II.


# Literature Survey

M. Rami Ghoran [4] studied that for every query that is submitted by the user he will get the relevant and irrelevant information for that query. So they classify the personalized information retrieval (PIR) system into three scopes: Individualized system, community-based system and aggregate-level system.

When individualized system is considered the system adaptive [5][6] decision are taken such that, the user interest and preferences are taken into account while Performing the search operations, while this approach leads to true to true personalization but it has some drawback such as:

Fresh start, when user is new to system his/her interest should be tracked and some time user may not compromise to share personal information with the system. Community-based system [7] describes sharing of the information among several users/models. The data enrichment technique such as clustering technique is used in grouping of the similarity among various users. Using some similarity criteria the users among the web can be grouped into one model, so that results for this community can be personalized. Aggregate-level system [8] where information gathered is represented in the form of summary for purpose of analysis. The common parameters such as age are considered to form clusters. For example a site selling music CD's may advertise certain CD's based on the age of the users and data aggregate for their age group. Online analytic processing (OLAP) is the simple type of data aggregation.

Browser also provides certain level of personalization by storing the cookies and recently visited web hyperlinks in the buffers. When the user is in static place browser will provide certain level of personalization, but when user place changes dynamically buffer contents are no more used.

For this purpose the new technique can be taken into consideration, such that each user's interest is maintained in the server buffer so that where ever user requests some result in form of query this can be compared with user interest buffer and relevant information can be retrieved from the system by minimizing unrelated results. When the user is new to system and enters any query for the first time the preferences for location is taken along with search keyword and search operation is performed. The keyword of the query is searched in the server and relevant results are fetched and displayed as the results. When the user clicks on some links, Click through data will be recorded. Later when the user searches for the same keyword, the previously visited pages will be displayed first with higher ranked pages and, if there is are any new links they will be ranked in lower order.


# III.


# System Design

Spy NB [9] is the algorithm used to fetch the user Click through data, and these are transformed to vectors for further process. The Ranked support Vector machine (RSVM) training is performed on the vectors for Re-ranking of search results according to user preferences. The detailed description about Spy NB and RSVM is given in implementation part.

The system mainly concentrates on building the method of ontology for all the possible keywords. The word can have different meaning in different context [2].

For example when the keyword "JAVA" is considered, in several perspectives it mean as the programming language, but by the name JAVA there is an island in Indonesia, and java coffee is referred to as a coffee beans.

When the two users submit the query both will get similar results either list of Java Island or list of java  Where s f (c i ) is the web snippet frequency of the keyword/phrase in the query Q, n is the total number of web snippet and |c i | is the number of terms in the keyword/phrase c i . If the support of the keyword/phase ci is higher than threshold Î?"T (where threshold Î?"T is set by user), than we consider c i as the concept for query Q.

In this system the value of Î?"T is set to 5 because, if Î?"T value is assigned with lesser value than for each search, ranking should be updated this leads to consume more time for reordering of links. assigned with larger value than perfect personalization cannot be achieved.

The following two prepositions are adopted to find relationship between concepts for ontology:

? Similarity: The two concepts which coexist more in the search results can be considered or represented as the same topic of interest. If occurrence of document c i , c j > Î?"T (where Î?"T is the threshold) then c i and c j can be considered as similar.

? Parent-Child Relationship: specific concepts appear with general terms, but backtracking is not true. If the preference of c i and c j > Î?"T then we can conclude that c i is child of c j . possible concept space determined for the keyword/phrase "Nokia" while Click through data will determine the preferences on the concept based. The concept space for the query "nokia" consists of different types of models such as E-series, N-Lumina etc. When E-series is taken into consideration both has similarity that they belong to same parent.

Content space for the query "Nokia" consists of "N1100", "E-series", "6600", and so on. If the user is interested in E-series and clicks on the page containing price, the Click through of the links are captured. These Click through data is considered as the positive preferences and vector is constructed.

When the same query is issued by the same user later the vector is transferred to server by transforming this content vector into content weight vector to rank the search result according to user preferences.

Location Ontology: The approach of the location ontology [13] [14] [15] is quite different from the construction of content ontology. Following assumptions are made i.e., the parent-child relationship cannot be accurately derived for the location ontology. To construct the vector [15]  Content Ontology: The concept works on extracting the keywords/phrase from the web snippets by eliminating all the stems in the query Q. The content ontology is classified differently to different users based on their interest. The co-existence of the keyword in the query Q is calculated to find similarity among the user interest by using following support and confidence rule [3]:

Clickthrough data: It is the process of recording the links or advertisement that is clicked by the user(s), for the purpose of determining which link is viewed how many times. The system makes use of these Clickthrough [10] data in personalizing each specific user's interest by maintaining the records for each user in the database. In formal language it can be defined as, it is triplets of (Q, R, C) where Q is the query, R is the ranking order in which it is displayed and C is the set of URLs that are clicked by the users. To achieve personalization the system is classified into two distinct levels namely, content ontology and location ontology [11] [12]. The detailed descriptions about two levels are elaborated in below section:

Bangalore, "Jaynagar/Bangalore/Karnataka/India", is associated with the document d. The construction of the vector for the location ontology is similar to that of the content ontology. The Clickthrough data is transferred to the server and transformed as the location vector and this vector is used to rank the user preferences.

IV.


# Implementation

In this section technique that are used to personalize the search engine are discussed in detail. First, when the query q is entered by the user, look for previous records if previous search results are found then apply Content ontology concept else if the user is new then accept the query q and apply Location ontology concept.

Ranking algorithm will rank the results according to the user preferences by calculating the weight of both content and location concepts, for keyword/ key phrase. The content weight of all posts for particular keyword is considered in calculating the ranking order.

The vector support machine is constructed for training the user preferences, loop is entered when the ranking operation is started, and the number of count is recorded for the link whenever the user clicks on it. When the post reaches the minimum threshold value then it will gain a higher order value as compared from rest of the post. The formal representation for performing these is depicted below: 


# Return Result

Spy Naive Bayes (SpyNB) algorithm is used to collect the Clickthrough data. This algorithm will maintain two sets called positive set Ps and negative set Ns. Where P s ? {Links that are clicked by the users} N s ? {Links not clicked by the users} Algorithm 1: CLOSE (Ui, q, L) // Input: User identity Ui, Query q and Current location of User L. // Output: Results for query with user preferences. 1. Accept the Query q from user where q ? {A-Z, a-z, 0-9} 2. Filter the post (documents) using the keyword q If (? Post (di) == compare (q)) 3. If (check user profile Ui for previous records) Next algorithm will be related to searching keyword based on Content ontology. Algorithm 2: Content-Ontology (Ui, q) // Input: User -Identity, and corresponding Query q. // Output: Return Results to CLOSE Next algorithm will be related to searching keyword based on Location ontology. Algorithm 3: Location-Ontology (Ui, q, L) Algorithm 4: SpyNB(s) // Input: Post matched for Query q. // Output: Feature vector for Post 1. Compare S with the user record. 


# Experimental Evaluation

The Table 1 gives the dataset of the content ontology construction for some of the keywords. The table mainly consists of unique code for particular root keyword, name of keyword and parent of the corresponding keyword [17]. In the experimental evaluation "Hotel" is the root word and it has four children such as "Reservation", "Facilities", "Meeting Room", and "Party hall", similarly for others also constructed.

Similarly Table 2 gives the dataset of the location ontology construction for some of the locations.

The table mainly consists of location code, Location name, latitude, longitude and parent of location. When location is considered, boundary value of 11 values is taken into consideration.


# Table 2 : Statistic of Location Ontology

In posting of documents the related information are stored by entering the root and location for which it belongs. In this case Hotel "comfort" comes under Bangalore city for which India will be root, and so on others are posted.

When user enters the query q, the searching process will be carried out as mentioned in the implementation section by invoking several techniques. When the corresponding documents are found, and previous records of users are analyzed, the ranking support vector machine is performed on the posts that are matched by the keyword or query q.

Table 3 gives the RSVM calculation for the Keyword "jaguar for two different users, it can be observe from the table that two user have their own preferences in choosing the link.

Later, when two users search for same keyword then threshold value changes and ranking of their search results will be altered.   


# Global


# Conclusion and Future Enhancement

We can conclude that the CLOSE system will provide better search results as compared to rest of the search engines by considering the users Content and location concepts. CLOSE system will take user preferences in minimizing the possible time for retrieving search results. RSVM training will be performed for each individual user profile, so that system will come to know in what the user is really interested.

As a future enhancement it can be extended by considering time as one of the parameter to even more optimize the search results. The sessions can also be considered as one of the parameter, so that when user stop work at particular instance, later when user get into system, at moment where user stopped working or viewing content of some documents, from that session it should be started (with respect to two or more different systems). 


# VII.


# Global
1![Figure 1 : Overall Architecture of the CLOSE system](image-2.png "Figure 1 :")
1![Fig 1 shows the complete architecture of the CLOSE system, the working procedure is as follows.When the user is new to system and enters any query for the first time the preferences for location is taken along with search keyword and search operation is performed. The keyword of the query is searched in the server and relevant results are fetched and displayed as the results. When the user clicks on some links, Click through data will be recorded. Later when the user searches for the same keyword, the previously visited pages will be displayed first with higher ranked pages and, if there is are any new links they will be ranked in lower order.Spy NB[9] is the algorithm used to fetch the user Click through data, and these are transformed to vectors for further process. The Ranked support Vector machine (RSVM) training is performed on the vectors for Re-ranking of search results according to user preferences. The detailed description about Spy NB and RSVM is given in implementation part.The system mainly concentrates on building the method of ontology for all the possible keywords. The word can have different meaning in different context[2].For example when the keyword "JAVA" is considered, in several perspectives it mean as the programming language, but by the name JAVA there is an island in Indonesia, and java coffee is referred to as a coffee beans.When the two users submit the query both will get similar results either list of Java Island or list of java](image-3.png "Fig 1")


q=NokiaLevel 0N 1100E Series6600N LuminaFeaturesE 5630E 5631Level 1PriceLevel 2Parent-child relationshipSimilarity
1Unique Code Keywords Parent 101 Hotel 0 102Reservation 101 103 Facilities 101 104 Meeting Room103 105 Party Hall 103 106 Animal 0 107 Jaguar 106108 Lion 106 109 Car 0 110 Jaguar 109 111 BMW 109112 Black Jaguar 107 113 Elephant 106
Algorithm 5: RSVM (count, post_code)// Input: count for each click is taken as the input.// Output: Ranking order of the posts.1. For i0 to total_post-1 do2. Content_weight_countcount.3. Calculate the Content weight for particular keyword. P_code Post_codeLocation CodeLocation NameParentLatitudeLongitude4. Content_weight (%)1India021.078.012Karnataka20012.9777.565. Final_content_weight123Bangalore20112.9777.576. P1124Mysore20112.303106 76.6402287. P2P1-1001231Jaynagar20212.9377.68. location_weight_parameter 9. Final_rank Final_content_weight + location_weight_parameter1232 13Koramangala Tamil Nadu202 20012.933881 77.622343 13.08 80.272London051.51-0.1221Barking and Dagenham20751.5452680.14757522Barnet20751.650194 -0.20089723Bexley20751.4418110.154297Unique CodeKeywordsParent101Hotel0102Reservation101103Facilities101104Meeting Room103105Party Hall103106Animal0107Jaguar106108Lion106109Car0110Jaguar109111BMW109112Black Jaguar107113Elephant106
3
			© 2014 Global Journals Inc. (US) Information Retrieval Based on Content and Location Ontology for Search Engine (CLOSE)coffee beans is displayed or list of java programming is displayed, but one user expecting only about island and other only programming language. The system mainly
			© 2014 Global Journals Inc. (US) Information Retrieval Based on Content and Location Ontology for Search Engine (CLOSE)
			© 2014 Global Journals Inc. (US)
		
		
## Acknowledgement

Foremost, I would like to express my sincere gratitude to my guide Mr. S G Raghavendra Prasad Assistant Professor, ISE Dept, RVCE, for the continuous support of my M. Tech study, for his patience, motivation, enthusiasm, and immense knowledge. His guidance helped me in all the time of writing this technical paper.

Besides my guide, I would like to thank the rest of my M.Tech committee: Dr. Jitendranath Mungara PG Dean, ISE Dept, RVCE, and Dr. Cauvery N K. HOD ISE Dept, RVCE.

			
## Author
			
			
* 
	
		A Survey of Concept based Information Retrieval Tools on the Web
		
			Hele-MaiHaav
		
		
			Tanel-LauriLubi
		
		
	White paper


* 
	
		Context-aware Personalized Mobile Web Search Techniques-A Review
		
			DeepikaBhatia
		
	
		IJCSIT) International Journal of Computer Science and Information Technologies
		
			2
			5
			
			2011
		
	
* 
	
		Modern Information Retrieval: The Concepts and Technology behind Search
		
			RBaeza-Yates
		
		
			BRibeiro-Neto
		
		
			2013
		
	
	Pearson Edition


* 
	
		Centre for Next Generation Localisation Knowledge & Data Engineering Group
		
			MRamiGhorab
		
		
	Personalised information retrieval: survey and classification


* 
	
		Techniques for Adaptive websites and Web Personalization without any user effort
		
			KanikaArora
		
		
			KamalKant
		
	
		IEEE Students conference on Electrical, Electronics and Computer Science
		
			2012
		
	
* 
	
		A Collaborative Decentralized Approach to Web Total=11 Total=866 Search
		
			AthanasiosPapagelis
		
		
			ChristosZaroliagis
		
	
		IEEE Transaction on Systems, Man, and Cybernetics
		
			42
			5
			
			2012
		
	
* 
	
		Query Enrichment for Web-query Classification
		
			DouShen
		
	
		ACM Transactions on Information Systems
		
			24
			3
			
			2006
		
	
* 
	
		Discovery and Evaluation of Aggregate Usage Profiles for Web Personalization
		
			BamshadMobasher
		
	
		ACM Transaction on Data Mining and Knowledge Discovery
		
			6
			1
			
			2002
		
	
* 
	
		Mining User Preference Using Spy Voting for Search Engine Personalization
		
			WilfredNg
		
	
		ACM Transaction on Internet Technologies
		
			7
			3
			
			2007
		
	
* 
	
		Enhancing personalized web search re-ranking algorithm by incorporating user profile
		
			RKVeningston
		
		
			Shanmugalakshmi
		
	
		IEEE
		
			
			2012
		
	
* 
	
		Ontology based User Personalization Mechanism in Meta Search Engine
		
			Li Qing-Shan
		
	
		IEEE
		
			
			2012
		
	
* 
	
		An ontology-based approach for semantics ranking of the web search engines results
		
			AbdelkrimBouramoul
		
	
		IEEE
		
			
			2012
		
	
* 
	
		Improving Mobile Search through Location Based Context and Personalization
		
			VarunMishra
		
	
		IEEE
		
			
			2012
		
	
* 
	
		Data Mining Based on Semantic Similarity to Mine New Association Rules
		
			SandeepJain
		
		
			AakankshaMahajan
		
	
		Global Journal of Computer Science and Technology Software & Data Engineering
		
			12
			2012
		
	
	Issue 12 Version 1.2


* 
	
		FoSSicker: A Personalized Search Engine by Location-Awareness
		
			MingyangSun
		
	
		IEEE
		
			
			2012
		
	
* 
	
		Enhancing the Degree of Personalization through Vector Space Model and Profile Ontology
		
			AlSharji
		
		
			Safiya
		
	
		IEEE Computing and Communication Technologies
		
			2013
		
	
	Research. Innovation, and Vision for the Future (RIVF


* 
	
		
		IEEE RIVF
		
			
			2013
		
	
* 
	
		Search Engine Evaluation Based on Page Level Keywords
		
			ShikhaGoel
		
	
		IEEE
		
			
			2013
		
	
* 
	
		SEReleC (Search Engine Result Refinement and Classification) -A Meta Search Engine based on Combinatorial Search and Search Keyword based Link Classification
		
			VishwasRaval
		
		
			PadamKumar
		
	
		IEEE
		
			
			2012