# INTRODUCTION ne of the hottest topics in the industry today is data warehousing and on-line analytical processing (OLAP). Although, data warehousing has been around in some form or another since the inception of data storage, people were never able to exploit the information that was wastefully sitting on a tape somewhere in a back room. Today, however, technology has advanced to a point to make access to this information an interactive reality. Organizations across the country and around the world are seeking expertise in this exploding field of data organization and manipulation. It is not a surpise, really, that business users want to get a better look at their data. Today, business opportunities measure in days, instead of months or years, and the more information empowering an entrepreneur or other business person, the better the chances of beating a competitor to the punch with a new product or service. The task of transitioning from a procedural mindset to an object-oriented paradigm can seem overwhelming; however, the transition does not require developers to step into another dimension or go to Mars in order to grasp a new way of doing things. In many ways, the object-oriented approach to development more closely mirrors the world we've been living in all along: We each know quite a bit about objects already. It is that knowledge we must discover and leverage in transitioning to object-oriented tools and methodologies. A data warehouse is a mechanism for data storage and data retrieval. Data can be stored and retrieved with a multidimensional structure--hypercube or relational, a star schema structure or several other data storage techniques. # II. # DATA COMPRESSION Data compression is of interest in business data warehousing, both because of the cost savings it offers and because of the large volume of data manipulated in many business applications. The types of local redundancy present in business data files include runs of zeros in numeric fields, sequences of blanks in alphanumeric fields, and fields which are present in some records and null in others. Run length encoding can be used to compress sequences of zeros or blanks. Null suppression may be accomplished through the use of presence bits. Another class of methods exploits cases in which only a limited set of attribute values exist. Dictionary substitution entails replacing alphanumeric representations of information such as bank account type, insurance policy type, sex, month, etc. by the few bits necessary to represent the limited number of possible attribute values. The problem of compressing digital data can be decoupled into two subproblems: modeling and entropy coding. Whatever the given data may represent in the real world, in digital form it exists as a sequence of symbols, such as bits. The modeling problem is to choose a suitable symbolic representation for the data and to predict for each symbol of the representation the probability that it takes each of the allowable values for that symbol. The entropy-coding problem is to code each symbol as compactly as possible, given this knowledge of probabilities. (In the realm of lossy compression, there is a third subproblem: evaluating the relative importance of various kinds of errors.) For example, suppose if it is required to transmit messages composed of the four letters a, b, c, and d. A straightforward scheme for coding these messages in bits would be to represent a by \00", b by \01", c by \10" and d by \11". However, suppose if it is known that for any letter of the message (independent of all other letters), a occurs with probability . This representation is more compact on average than the first one; indeed, it is the most compact representation possible (though not uniquely so). In this simple example, the modeling part of the problem is determining the probabilities for each symbol; the entropy-coding part of the problem is determining the representations in bits from those probabilities; the probabilities associated with the symbols play a fundamental role in entropy coding. One well-known method of entropy coding is Huffman coding, which yields an optimal coding provided all symbol probabilities are integer powers of .5. Another method, yielding optimal compression performance for any set of probabilities, is arithmetic coding. In spite of the superior compression given by arithmetic coding, so far it has not been a dominant presence in real data-compression applications. This is most likely due to concerns over speed and complexity, as well as patent issues; a rapid, simple algorithm for arithmetic coding is therefore potentially very useful. An algorithm which allows rapid encoding and decoding in a fashion akin to arithmetic coding is known as the Q-coder. The QM-coder is a subsequent variant. However, these algorithms being protected by patents, new algorithms with competitive performance continue to be of interest. The ELS algorithm is one such algorithm. The ELS-coder works only with an alphabet of two symbols (0 and 1). One can certainly encode symbols from larger alphabets; but they must be converted to a two-symbol format first. The necessity for this conversion is a disadvantage, but the restriction to a two-symbol alphabet facilitates rapid coding and rapid probability estimation. The ELS-coder decoding algorithm has already been described. The encoder must use its knowledge of the decoder's inner workings to create a data stream which will manipulate the decoder into producing the desired sequence of decoded symbols. As a practical matter, the encoder need not actually consider the entire coded data stream at one time. One can partition the coded data stream at any time into three portions; from end to beginning of the data stream they are: preactive bytes, which as yet exert no in seuence over the current state of the decoder; active bytes, which affect the current state of the decoder and have more than one consistent value; and postactive bytes, which affect the current state of the decoder and have converged to a single consistent value. Each byte of the coded data stream goes from preactive to active to postactive; the earlier a byte's position in the stream, the earlier these transitions occur. A byte is not actually moved to the external _le until it becomes postactive. Only the active portion of the data stream need be considered at any time. Since the internal buffer of the decoder contains two bytes, there are always at least two active bytes. The variable backlog counts the number of active bytes in excess of two. In theory backlog can take arbitrarily high values, but higher values become exponentially less likely. [13]. Sang et al in their paper "A novel approach to scene change detection using a cross entropy ," have shown that in huge video databases, an effective video indexing method is required. While manual indexing is the most effective approach to this goal, it is slow and expensive. Thus automatic indexing is desirable, and previously various indexing tools for video databases have been developed. For efficient video indexing and retrieval, the similarity measure is an important factor. This paper presents new similarity measures between frames and proposes a new algorithm to detect scene changes using a cross entropy defined between two histograms. Experimental results show that the proposed algorithm is fast and effective compared with several conventional algorithms to detect abrupt scene changes and gradual transitions including fade in/out and flash light scenes [12]. # III. # RELATED WORK IV. # OBJECTIVE The objective of the present study is to 1. Develop data of compression for object oriented data warehousing. # Devise efficient compression algorithms in data warehousing to enhance the efficiency of the data warehousing packages so that less CPU time and less Memory is consumed. Implement compressor and expander using entropy algorithm and test its effectiveness on different sized databases ![5, b occurs O Global Journal of Computer Science and Technology Volume XI Issue XVIII Version I .125 each. Then a shorter representation might be chosen for a, at the necessary cost of accepting longer representations for the other letters. a could be represented by \0", b by \10", c by \110", and d by \111".](image-2.png "") ![](image-3.png "") ![](image-4.png "") © 2011 Global Journals Inc. (US) Comparison of Time Taken and Compression Efficiency for Different Sizes of Databases ## CONCLUSION In this paper we have discuss the data compression and how the data is compresses in oracle 10g using object oriented language. Data Compression is of interest in business data warehousing, both because of the cost saving it offers and because of the large volume of data manipulated in many business application. The entropy is used in many areas such as image processing, document images. But in our research we used the entropy in object oriented data warehousing. Creation of different sizes of databases in oracle. Employment of object oriented programming for compression using data warehousing. Further compression of database .csv files using C++. Comparison of time taken and compression efficiency for different sizes of databases * Using the compressed data model in objectoriented data warehousing Wei-Chou Chen; Tzung-Pei Hong; Wen-YangLin IEEE SMC '99 Conference Proceedings. 1999 IEEE International Conference on 1999. 1999 5 * Wei-Chou Chen; Tzung-PeiHong * A composite data model in object-oriented data warehousing Wei-YangLin TOOLS 31. Proceedings 1999. 1999 Technology of Object-Oriented Languages and Systems * The Novel Model of Object-Oriented Data Warehouses JCShieh HWLin Workshop on Databases and Software Engineering 2006 * Three maintenance algorithms for compressed object-oriented data warehousing ChenWei-Chou HongTzung-Pei ; Lin Wen-Yang 5. Boqiang Huang; Yuanyuan Wang; Jianhua Chen April 2009 56 IEEE Transactions on * On entropyconstrained residual vector quantization design YGong MK HFan C.-MHuang Data Compression Conference 1999. Mar 1999 526 Proceedings. DCC '99 * Wavelet entropy based no-reference quality prediction of distorted/decompressed images IDe JSil 2nd International Conference on 2010. April 2010 3 * ANFIS tuned no-reference quality prediction of distorted/decompressed images featuring wavelet entropy IDe JSil Computer Information Systems and Industrial Management Applications (CISIM), 2010 International Conference on 8-10 Oct. 2010 * An entropy based segmentation algorithm for computergenerated document images LLiu YDong XSong GFan Proceedings. 2003 International Conference on 2003 International Conference on 2003. 2003. Sept. 2003 1 * Context-based entropy coding of block transform coefficients for image compression CTu TDTran IEEE Transactions on 11 11 Nov 2002 Image Processing * Using difficulty of prediction to decrease computation: fast sort, priority queue and convex hull on entropy bounded inputs SChen JHReif Proceedings., 34th Annual Symposium on Sang Hyun Kim 34th Annual Symposium on 1993 * A novel approach to scene change detection using a cross entropy Rae-HongPark Proceedings. 2000 International Conference on 2000 International Conference on 2000. 2000 3 * Lossless Compression Using Conditional Entropy-Constrained Subband Quantization AScales WRoark FKossentini MJ TSmith Data Compression Conference, 1995. DCC '95. Proceedings Mar 1995 498 * Entropy coding with variable length re-writing systems HJegou CGuillemot ISIT 2005. Proceedings. International Symposium on 2005. Sept. 2005 Information Theory * HuaXie * Entropy-and complexityconstrained classified quantizer design for distributed image classification AOrtega Multimedia Signal Processing 9 2002. 11 Dec. 2002 IEEE Workshop on * A Implementation of Object Oriented Database Security PBAmbhore 5th ACIS International Conference on Software Engineering Research IEEE Computer Society August 20-22, 2007. 2007 Institute of Electrical and Electronic Engineers * Data Warehouse Designing on Relational Database Systems MPark 1996 Informix Co Stanford * Maintenance in Data Warehousing Environment GMolina 1995 San Jose Co., California * NRoussopoulos Data Warehouses and Materialized Views Greece Leander Press 1997 * Building Advanced Data Warehouse, NCR Corporation RSwift 1996 California * EMarlin ODBMS vs. Relational Object-Oriented Programming SAGE, London 1992 * Relational database schema integration by overlay and redundancy elimination methods, in International Jae Jin Koh 3-6 October, 2007. Nov 1993 * Institute of Electrical and Electronic Engineers Forum on Strategic Technology 2007 IEEE Computer Society * The next generation DBMS SMichael 1991 Pearson Education New York * Method precomputation in objectoriented databases EBertino Proceedings of ACM-SIGOIS and IEEE-TC-OA International Conference on Organizational Computing Systems ACM-SIGOIS and IEEE-TC-OA International Conference on Organizational Computing Systems 1991 * JEder HFrank WLiebhart Optimization of Object-Oriented Queries by Inverse Methods. Proceedings of East/West Database Workshop Austria 1994