# Introduction t has been observed that in most typical data analysis and data mining applications, timeliness and interactivity are more important considerations than accuracy; thus, data analysts are often willing to overlook small inaccuracies in the answer, provided that the answer can be obtained fast enough. This observation has been the primary driving force behind the recent development of approximate query processing techniques for aggregation queries in traditional databases and decision support systems [4], [5]. Numerous approximate query processing techniques have been developed: The most popular ones are based on random sampling, where a small random sample of the rows of the database is drawn, the query is executed on this small sample, and the results are extrapolated to the whole database. In addition to simplicity of implementation, random sampling has the compelling advantage that, in addition to an estimate of the aggregate, one can also provide confidence intervals of the error, with high probability. Broadly, two types of sampling-based approaches have been investigated: 1) pre-computed samples, where a random sample is pre-computed by scanning the database and the same sample is reused for several queries and 2) online samples, where the sample is drawn "on the fly" upon encountering a query. So the selection of these random samples in distributed environments for query processing is addressed in [6]. Data warehouses (DW) [6] are built by gathering information from data sources and integrating it into one virtual repository customized to users' needs. One important task of a Data Warehouse Management System (DWMS) is to maintain the materialized view upon changes of the data sources, since frequent updates are common for most data sources. In addition, the requirements of a data source are likely to change during its life-cycle, which may force schema changes for the data source. A schema change could occur for numerous other reasons, including design errors, the addition of new functionalities and even new developments in the modeled application domain. Even in fairly standard business applications, rapid schema changes have been observed. In [10], significant changes (about 59% of attributes on the average) were reported for seven different applications over relational databases. A similar report can also be found in [15]. These applications ranged from project tracking, sales management, to government administration. In situations that real-time refreshment of the data ware-house content is not critical; changes to the sources are usually buffered and propagated periodically such as once a day to refresh the view extent. Two benefits are possible. One is to gain better maintenance performance. The other is that there are less conflicts with DW read sessions. In a data update only environment, most view maintenance (VM) algorithms proposed in the literature [17,1,14] group the updates from the same relation and maintain such a large delta change in a batch fashion. However, these algorithms would fail whenever source schema changes occur, which are also common as stated above. One obvious reason is that the data updates in this group may be schema inconsistent with each other if there are some schema changes in between. On the other hand, work has begun on incorporating source schema changes into the data warehouse, namely, view synchronization (VS) [8] aims at rewriting the DW view definition when the source schema has been changed. To handle the delete of any schema information of a data source, VS tries to locate an alternative source for replacement to keep the new view semantically as close to the original view as possible. Thereafter, view adaptation (VA) [12] incrementally adapts the view extent to keep the new view consistent. Such algorithms are also not sufficient to batch a group of mixed data updates and schema changes, since there could be a number of schema changes interleaved with some data updates. In this paper, we propose a solution strategy that is capable of batching a mixture of both source data updates II. # Definition of Terms View evaluation can be represented by a tree, called an expression tree [5,9]. An expression tree is a tree, where the leaf nodes represent base relations and non-leaf nodes represent binary expressions in the relational algebra. The unary relational algebraic expressions are associated along the edges. A view or a query is optimized by the query optimizer before executing it. A query optimizer takes an expression tree as input and produces an output, called an optimized expression tree, which determines the internal sequence of operations for executing a query. Thus, an optimized expression tree defines a partial order in which operations must be performed in order to produce the result of the view. # Depth: The depth of leaf nodes, that is data base relations is 0. The depth d of a node is defined as max(depth of descendents)+1. # Height: The height of the optimized expression tree is defined as the maximum depth of any node in the tree. Given a node i in the expression tree, its parent is denoted by i, and op(i) and op( i) are the expressions associated with i and i, respectively. The children of node i are denoted by i' and i'' where i' is a sibling of i'' and vice versa. IR i denotes the intermediate result of node i. The auxiliary relation associated with node I is denoted AR i in the case where only one relation is needed, and by AR 1 i and AR 2 i when two are needed. The key of IR i is denoted by K i , and the keys of IR i' and IR i'' are denoted by K i' and K i'' , respectively. Insertion and deletion of tuples are denoted by and respectively. The symbol ? either an inserted set or a deleted set of tuples. The instance of a relation, say Ri, before and after an update is denoted by Ri old an d Ri n ew respectively, similiary for an auxiliary relation AR and a materialized view V. # III. # Example & Simplification Consider a data warehouse for a large research organization which has got many departments and each department has many research groups. Suppose this data warehouse is collecting data from four base relations whose schemas are as follows: # R1: emp_rschr(rschr_id,rname,deptno,major) This relation gives the researchers id, name, department and major. # R2:emp_paperpublish(rschrid,paper_id,paper_title,sour ce_of_publiscation, year_of_publish) This gives researchers id,paper id, paper title, source of publication and year of publish. # R3: emp_manager(rschr_id,deptno) This relation contain one record for each manager and his department. Assume that each department has one manager. Since a manager is also a researcher, relation emp_rschr has a tuple for each manager. # R4: emp_groupleader(rschr_id,deptno) This relation contains information about th research group name and who is leading this group. Since a group leader is also a researcher, relation emp_rshcr has a tuple for each group leader. Suppose a user of the organization is interested in materializing and maintaining the following view: 'Researchers other than managers and group leaders along with their departments who have published more than 10 papers in the year 2010.' In # C to the auxiliary relations incrementally. Deferring the changes allows analysts that query the warehouse to see a consistent snapshot of the data throughout the day, and can make the maintenance more efficient. Figure 1 shows the optimized expression tree for the above view. Here, the nodes at leaf level are base relations and non-leaf nodes are expressions. Each nonleaf node in the tree corresponds to a relational algebraic expression given above. # Figure 1: Expression tree Suppose Researchers or Paper_Public relations are updated. In this case we materialize the two auxiliary relations View2 and View3. The contents of these views are derived while computing the view first time. By materializing these two auxiliary relations in the warehouse, the view is self-maintainable along with these auxiliary relations. Suppose new researchers joined the organization, therefore, one tuple for each new researcher in emp_rschr relation has to be inserted. These insertions will led to generate tuples that to be inserted in rschr_ex_manager_groupleader. Since these new researchers have not published any paper at the time of joining, these tuples cannot join with any tuples of emp_paper_publish, thus there will no change in the materialized views. Therefore, all auxiliary relations and materialized views are self_maintainable. Now consider another case where a set of tuples is inserted in emp_paper_publish relation, say R. Then, we first compute the research paper those are published in year 2010 and then it is join with rschr_ex_managergroupleader view. Lastly the intermediate result is grouped in the final auxiliary relation by performing count operation. In this case also, the view and auxiliary relations are self-maintainable. # IV. # Procedure of Materialize Views Maintenance The materialize view maintenance process can be divided into two functions: 1. Propagate and 2. Refresh. The work of computing the auxiliary relations happens within the propagate function, which can take place without locking materialize views so that the warehouse can continue to be made available for querying by analysts. Materialize views are not locked until the refresh function, during which time the materialize views are updated from the auxiliary relations. The propagate function involves updating the auxiliary views incrementally from deferred set of changes. The final auxiliary view represents the net changes to the materialize views due to the changes in the underlying data sources. The refresh function applies the net changes represented in the final auxiliary relation to the materialize views. This process carried out after a specific time interval or when the system has free cycles. So none of the data warehouse users or operations are affected by the view maintenance process. None of the query has to pay for view maintenance. The materialize view maintenance process totally hidden by users and running transactions. Whenever an interested change happens in the underlying data source, simply this desire change is stored in the auxiliary relations by comparing and joining it with others relations if required. This change is passed to the higher level auxiliary relations. Again the change is integrated and circulated to final auxiliary relation. Lastly the change is refreshed into the data warehouse when the refresh trigger is occur. # a) Analytical Cost Model In this section we show the performance results of our materialize view maintenance method. The results are based on the following cost model. # Global Journal of Computer Science and Technology Volume XVIII Issue III Version I # i. Cost Model The overall view maintenance cost of materialized views includes the cost of propagate the changes and the cost of refresh operations. Let V 1 ,V 2 ,?.,V m be the m materialized views. Let B 1 , B 2 ,?,B n be the n base relations and A 1 ,A2,?A i be the i auxiliary relations. Let f u1 B1 ,?..,f un Bn be the update frequency to the base relations. Let C ij B->A be the cost of propagating an update on base relation B i to auxiliary relation A j and C jk A->V be the cost of refresh of auxiliary A j to materialized view V k . The overall cost of maintaining the views when keeping both the materialized views and the auxiliary relations is: C MV +AR=? ?f ???? ???? ? * (? C?> ??? ?? ?? ??? ?? ?? ????? ?? =1 ?? =1 ?? ?? ?? ??? ??=?? ??=1 The total view maintenance cost with no auxiliary relations is: C MV =? ?f ???? ???? ? * ??=?? ??=1 (? C ??=?? ??=1 It is obvious that the cost of maintaining the materialized views directly from base relations is much more than the cost of maintaining materialized views through auxiliary relations. V. # Evaluation To verify the feasibility and effectiveness of our view maintenance strategies and corresponding optimization framework, we have implemented the proposed techniques using Oracle 9i. All experiments were performed on a workstation with Pentium D 3.2 GHz processor, 1 GB of memory and 160 GB disks, running Windows XP. Relation R1 contain 500000 records, R2 contains 25000 records, where as in R3 there are records of individual manager of a department and in R4 holds the records of group leaders. We considered two types of changes: Update-Generating changes: Insertions and deletions of an equal number of tuples over existing researchers and paper publishers. These changes mostly cause updates amongst the existing tuples in materialized view. Insertion-Generating changes: Insertions over new researchers those who published certain number of research papers. These changes cause only insert into paper publish table. The insertion-generating changes are very meaningful since in many data warehousing applications the only changes to the fact tables are insertions of tuples for new dates, which leads to insertions into materialized views. Figure 2 shows four graphs illustrating the performance advantage of using incremental materialized view maintenance method which uses auxiliary views to store intermediate results. The view maintenance time is split into two functions propogate and refresh. While computing the intermediate result the data warehouse is remain free to the user. Figure 2 (a) and (b) plot the variation in elapsed time as the size of the change set changes(delta relation), for a fixed size 500000 records in emp_rschr relation and 250000 records in emp_paperpublish relation. We found that the incremental materialize view maintenance using auxiliary relations wins for both types of changes, but it wins with a greater margin for the update generating changes. The refresh time is going down by 20% in figure 2(b). # Conclusions We have investigated one of the significant problems of a data warehouse, that is, materialized view maintenance and how to make warehouse materialized views self maintainable without accessing the data from underlying data sources. The study shows that it is possible to make warehouse views self maintainable by materializing additional auxiliary relations, which contain intermediate results, at a data warehouse site. Using efficient incremental materialize view maintenance technique it is possible to reduce the cost of view maintenance. Proposed materialize view maintenance technique using auxiliary relation and dividing the maintenance process into two steps: propagate and refresh require less maintenance time as compared to counting algorithm. Here the propagate function works implicitly and whenever the data warehouse is ideal the refresh function integrate the data into data warehouse views. The entire maintenance process is hidden from the data warehouse users. Year 201812Volume XVIII Issue III Version ICreate view mngr_or_groupleader (rschr_id, deptno) as select rschr_id, deptno from emp_rschrUNION // This view is for finding manager and group leader Create view rschr_ex_ manager_or_groupleader (rschr_id, deptno) as select reshr_id, deptno from emp_rschr where NOT EXISTS (select *from mngr_or_grouple ader where emp_rschr.id=mngr_ or_groupleader.id) //This view is for finding researcher, those are not manager or group leader. Create view rschrpaperview2010 (rschr_id, paper_id, deptno) as select emp_paperpublish.rschr_id, paper_id, deptno from rschr_ex_manager_or_group leader, emp_paperpublish where rschr_ex_ manager_or_ group leader.rschr_id=emp_paperpublish.rschr_id and year= '2010'. //This view gives the researcher those who have published paper in the year 2010. Create view rschrpaperview(rschr_id,deptno) as Select rschr_id, deptno from rschrpaper view2010 group by rschr_id having count(*)>10; // Global Journal of Computer Science and Technology (select rschr_id, deptno from emp_groupleader) ( ) © 2018 Global JournalsIncremental Maintenance of a Materialized View in Data Warehousing: An Effective Approach * Maintaining Materialised Views in Distributed databases ASegev JPark Proceedings of the IEEE International Conference on Data Engineering the IEEE International Conference on Data Engineering 1989 * Currency based updates to distributed materialized Views WSegev Fang proceedings of the IEEE International Conference on Data Engineering the IEEE International Conference on Data Engineering 1990 * Incremental View Maintenance: An Algorithmic Approach SAbdulaziz Almazyad & Mohammad Khubeb Siddiqui Internatioinal Journal of Electrical & Computer Sciences IJECS-IJENS 10 03 2009 * Optimizing Cyclic Join View Maintenance over Distributed Data Sources BinLiu & Elke ARundensteiner IEEE Transactions on Knowledge and Data Engineering 18 3 March 2006 * Efficient View Maintenance at Data Warehouses DAgarwal AEAbbadi ASingh TYurek Proc. ACM SIGMOD ACM SIGMOD 1997 * Special Issue on Materialized Views and Data Warehousing DLomet JWidom IEEE Data Engineering Bulletin 18 2 June 1995 * A performance analysis of view materialization strategies ENHanson SIGMOD pages 1987 * Incremental View Maintenance on Multi-Source GianlucaMoro ClaudioSartori 2001 In proceedings of IEEE * Multi agent Immediate Incremental View Maintenance for Data 10. Warehouses CHGray WilliamAYeung Gruver IEEE Transaction on Systems, Man & Cybermetics-Part A: Systems & Human March 2005 35 * Asymmetric Batch Incremental View Maintenance JunyiHao He JunXie HaiYang Yu the Proceedings of the 21 st International Conference on Data Engineering 1084-4627/05, 2005 * Database System Concepts EHeney AbrahamKorth Silberschatz 1986 McGraw Hill * Efficient Updating Materialized Views JABlakeley PALarson FWTompa Proc. ACM SIGMOD ACM SIGMOD May 1986 * DyDa: Data Warehouse Maintenance under Fully Concurrent Environments JChen XZhang SChen KAndreas EARundensteiner Proc. ACM SIGMOD Demo Session ACM SIGMOD Demo Session 2001 619 * The Stanford Data Ware housing Project JHammer HGarcia-Molina JWidom WLabio & Zhuge IEEE Data Engineering Bulletin June1995 * Lazy Maintenance of Materialized Views JingrenZhou PerakeLarson HichamGElmongui Proceddings of 33 rd International conference on VLDB eddings of 33 rd International conference on VLDBVienna, Austria 2007 * Supporting Multiple View Maintenance Policies LSColby AKawaguchi DFLieuwen ISMumick KARoss Proceeding ACM SIGMOD International Conference on Management of Data eeding ACM SIGMOD International Conference on Management of Data 1977 * Algorithms for Defered View Maintenance SLatha TimothyColby LeonidGriffin SinghLibkin HowardMumick Tricky proceedings of ACM SIGMOD ACM SIGMOD 1996 * Database Snapshots MAdiba & BLine\dsay Proceedings of the sixth International Conference on Very Large Databases the sixth International Conference on Very Large DatabasesMontreal, Canada October 1980 * Avoiding re-computation: View Adaptation in Data Warehouses MMohnia Proc. Of 8 th International Database Workshop Of 8 th International Database WorkshopHong Kong 1997 * Efficient View Self-Maintenance NHyun Proceeding of ACM workshop on Materialized views: Techniques & Applications eeding of ACM workshop on Materialized views: Techniques & Applications June 7, 1996 * An Incremental Access Method for Viewcache: Concept, Algorithms and Cost Analysis NRoussopoulos ACM Trans. On Database Systems 16 3 1991 * Incremental Evaluation of Rules & Its Relationship to Parallesim OWolfson HMDewan SJStolfo Yemini Proceedings ACM IGMOD, International Conference on Management of Data ACM IGMOD, International Conference on Management of Data 1991 * A framework for supporting data integration using the materialized & virtual approaches RHull & GZhou SIGMOD Int'l Conference June 4 -6,1996 * Efficient Incremental Evaluation of Queries with Aggregation RRamakrishan KARoss DSrivastava SSudarshan International Logic Programming Symposium 1994 * An Overview of Data Warehousing and OLAP Technology SChaudhuri UDayal ACM SIGMOD Record, volume26 1974 * Multiversion Based View Maintenance over Distributed Data Sources SChen BLiu EARundensteiner ACM Trans. Database Systems (TODS) 29 4 2004 * A Comparative Study of Materialized View Maintenance Techniques in Data Warehousing SSolanki Dr AjayKumar IJRIME 1 2 August 2011 * A Comprehensive Study of Data Warehousing SSolanki Dr AjayKumar January 2012 IJMR, Vol1, Issue 1