Introduction he 21st century is the age of Internet and World Wide Web. The Web revolutionizes the way we gather, process, and use information. At the same time, it also redefines the meanings and processes of business, commerce, marketing, finance, publishing, education, research, development, as well as other aspects of our daily life [1]. Modified Collection selection is the selection of an optimal weight subset of collections from a large set of collections for the purpose of reducing costs associated with Distributed Information Retrieval. The goal of modified collection selection is to make searching multiple collections appear as seamless as searching a single collection. Another requirement of a modified collection selection using optimal term weighting system is to learn which collections contain relevant information and which collections contain no relevant information. This reduces the number of overall search requests needed. If only a small high quality subset of the available collections is searched then savings can be made in time, bandwidth, and computation [4]. Web based collection selection is significant because as the internet grows the number of internet based collections grows. It is now impossible to anually track and index all collections as they number in the thousands. This method will enable users to choose the best collections for their needs without having to sift through irrelevant collections. Collection selection optimal term method reduces expenses, increasing search speed, learning to adapt to change in the search environment, using ontology to increase precision, and learning to adapt to the users preferences. The paper is organized as follows. Section 2 discusses main difference between traditional method in web based collection and optimal term weight method for collection selection Section 3 presents application of the approach. Conclusion presents main features of the system that help fulfill fundamental demands of the intelligent Web's design and development II. # Modified Collection Selection Modified Collection Selection using optimal term is the selection of an optimal set of information sources from a large set of information sources. An information source can be a Web interface, a standard relational collection, a file, a search engine, or any other textual representation of information. Collection Selection aims to be efficient with respect to bandwidth and computation, and decreases both resource usage and time taken to return a set of results for a query. Well planned collection selection can have a large influence on the efficiency of a query. Collection selection is significantly different to document selection in a number of areas. Collection selection uses different methods to document selection for scoring items relevance [3]. Document selection commonly uses a binary relevance value, which collection selection cannot use. Instead collection selection must use a floating point number to represent relevance. Collection selection also differs from document selection in that it uses different ways of calculating term weighting. (terms distributed across all documents in a collection are worth more than terms clustered in one document of a collection) Another difference between collection selection and document selection is that different content selection methods are needed, with Web based collection selection commonly using partial collection sampling, and document selection using full document indexing. These differences mean that collection selection using optimal term requires a significantly different approach to document selection III. # Modified Collection Selection Algorithm In this section, we give the details of our collection selection algorithm. The inputs of the algorithms include a query, a selected set of terms (key words), and a set of sample documents from each collection. a) Algorithm 1. Calculate the term-collection matrix A where we view the query as a new collection. 2. Use singular value decomposition. U ?V T = A 3. Sort the collections according to the values in the query row in the matrix V T 4. Use the threshold to calculate a rank of collections. 5. After ranking the collection we need to find the optimal term weight to find the relevant pages which are more appropriate. Term-collection matrix is created, adding the query to the matrix in the form of a new (small) document column. Negative weights can be given to terms that are not to be returned in the query. Applying Singular Value Decomposition to the matrix returns a term matrix (U), a Singular Value matrix(S), and a collection matrix (V). For every search performed, the user will give the top n collections (n is currently 10) a floating point precision ranking in the range of 0 to 1. The higher the ranking the more precise the results. After training run of (say) twenty searches collection matrix and the latent statistical relationships between collections computed [5].The returned values are a score for each collection, with zero being not relevant and one being most relevant. This will find relationships existing between collections that are not immediately obvious, and will result in a more personalized search which will over time learn the user's preferences. IV. # Conclusions A solution to the Web Based Collection Selection problem has been presented, and preliminary results indicate that the technique is suited to the task of selecting the most relevant collections and learning user preferences in collections. The approach uses short queries and is thus suitable for use on the Web. This approach also reduces the need for ontologies and thesaurus. With some modification, this collection selection method is suitable for traditional information retrieval systems across servers and databases. A problem is that these systems do not rank the data before returning it. This could be solved using simple sampling techniques that would grab a representative sample of the collection, rank it, then compare it across collections. As the number of collections indexed grows, so does the number of terms and the size of the matrix. However in this research, only the top n most representative documents from each collection are sampled so it is possible to compare hundreds of collections in a reasonable time if n is small. Due to the time expense of writing screen scraping applications for web based collections and comparing the results to human rankings of the documents in the collections, the researchers were unable to perform large scale tests of the methods presented in this research. Work still needs to be done to on the optimal sample size taken from each collection. * Matrices, vector spaces, and information retrieval MWBerry ZDrmac ERJessup 1999 Society for Industrial and Applied Mathematics 41 * The effects of query-based sampling on automatic database selection algorithms JCallan ALPowell JCFrench MConnell CMU-LTI-00-162 2000 Language Technologies Institute, School of Computer Science, Carnegie Mellon University Technical Report * Server selection on the World Wide Web NCraswell PBailey DHawking Proceedings of the fifth ACM conference on Digital libraries the fifth ACM conference on Digital librariesSan Antonio, Texas, United States ACM Press 2000 * Web Based Collection Selection Using Singular Value Decomposition School of Software Engineering and Data JohnKing YuefengLi Australia Communications Queensland University of Technology QLD 4001 * NZhong JLiu YYao Search of the Wisdom Web November 2002 35 * References Références Referencias * US) Guidelines Handbook Global Journals Inc 2016