# INTRODUCTION
he analysis and mining of large volumes of transaction data for making business decisions.
Today it has become a key success factor than ever for vendors to understand their customers and their buying patterns. If they don't they will lose them. In order to gain competitive advantage it is necessary to understand the relationships that prevail across the data items among millions of transactions. The amount of data currently available for studying the buying pattern is extensive and increasing rapidly year by year. Therefore the need to devise reliable and scalable techniques to explore the millions of transactions for the customer buying pattern continues to be important. Above this, the increasing volume of data sets data demands for huge amounts of resources in storage space and computation time. As it is not feasible to have huge storage spaces to store the explosively growing data in a single location they are stored in distributed database and data warehouse located in different geographical location. Inherently data distributed over a network with limited bandwidth and computational resources motivated the development of distributed data mining (DDM).
Though mining process in DDM is carried out in distributed locations parallel and generates required results in the local areas it is necessary to analyze these Author ? : Research Scholar,Bharathiar University,Coimbatore,India. Telephone: 09751149851 E-mail : anbarasi2@gmail.com Author ? : Prof,Bharatiary University,School of Management and studies,Coimbatore India. E-mail : vivekbsmed@gmail.com local patterns to obtain the global data model. Hence the knowledge derived from local distributed location is moved to the central site and the local results are combined there to obtain the final result. This approach is less expensive but may produce ambiguous and incorrect global results. Even though communication is a bottleneck problem in a central data repository it guarantees accurate results of data analysis. To address the bottleneck problem in central learning strategy, this work proposes a dimension reduction method which uses the concept of sum of subset and scalable to very large databases. In this work the site which request data from different geographical locations is treated as central site.
# II.
# EXISTING WORK
Principal component analysis (PCA) is a mathematical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of uncorrelated variables called principal components. The number of principal components is less than or equal to the number of original variables. This transformation is defined in such a way that the first principal component has as high a variance as possible), and each succeeding component in turn has the highest variance possible under the constraint that it be orthogonal to (uncorrelated with) the preceding components. Principal components are guaranteed to be independent only if the data set is jointly normally distributed.
Linear Discriminant Analysis (LDA) attempts to maximize the linear separability between data points belonging to different classes. In dissimilarity to most other dimensionality reduction techniques, LDA is a supervised technique. LDA finds a linear mapping M that maximizes the linear class separability in the lowdimensional representation of the data.
# III.
# PROBLEM DESCRIPTION
Data storage conversion algorithm transforms a transaction into a single dimension transaction with all attributes that appears in its original form. The encoded transactions are represented by a sequence of numbers, which is sum of subset approach. Any kind of combination of 2 1 ,2 2 ,2 3 ?2 n ,the sum of different values gives as unique, This is way motivated to do the my research work. By this way, the new transaction is smaller than the original form and hence the cost of Storage is reduced.To offer highly specialized solutions for small parts of the general problem.
IV.
# ENCODING AND DECODING TECHNIQUES FOR DATA STORAGE
A matrix is constructed with the given set of data items as shown in the Table-1. The order and dimensions of the matrix are user defined. The only constraint is that the number of columns should not exceed 14, as the value of 2^14 will exceed the range of an 'int'.
# Table 1: Display of item set
Of the entire set of data items, a transaction is always a subset of the data items. This subset of data items is encoded into a reduced database. If it reduced database minimized the memory area. The Table-2 explains the encoding process. For each data item in the transaction, the row 'i' and column 'j' is noted. The value 2^j is calculated and added to the i'th value in the transaction value E. The process is repeated for each data item one by one and final 'n' digits from the Table - The Transaction T1 = Chicken, Wheat bread, Dry fruits, Jam, Soft drink, Sugar, Pizza
In Table 4a, the items chosen in the row 1 are found by decoding the number '34'. Since the given matrix has 5 columns -the values of 2^5, 2^4, 2^3, 2^2 and 2^1 are all subtracted from the value '34' one by one, cumulatively. Each time, the subtraction gives a positive value, the corresponding column's data item is chosen. The final list of chosen data items in this table indicates the original items from the transaction. The process is repeated on the other values of the reduced transaction form ie, 50 n 10 in Tables 4b and 4c respectively. Thus the remaining data items from the transaction are also decoded.
V.
# CONCLUSION
The above stated technology appears to be the most fitting and forceful method adaptable in the distributed data as well as in the distributed data mining process in terms of speed and competence when we measure it up to the old methods. Another useful characteristic that is covered under this new technology is that it could be updated constantly when it is essential Encoding and Decoding Techniques for Distributed Data Storage Systems since the data is maintained at remote sites. The huge quantity of data is not needed to be stored to the much location for this purpose hence the storage spaces are used most favorably. To make complete use of the novel technology, the customary client server distributed data mining scheme must be entirely replaced with it. Methodology expansions for merging the accumulated information from different spots are in advancement. The purpose with this paper was to provide an overview of the specific of approach that can be employed for dimension reduction when processing high dimension data.
![, Wheat bread, Dry fruits, Jam, Soft drink 10-08 =02 Sugar Tk = Chicken, Wheat bread, Dry fruits, Jam, Soft drink, Sugar 02 2 Tk = Chicken, Wheat bread, Dry fruits, Jam, Soft drink, Sugar 02-04= 02 02 1 Tk = Chicken, Wheat bread, Dry fruits, Jam, Soft drink, Sugar 02-02= 00 Pizza Tk = Chicken, Wheat bread, Dry fruits, Jam, Soft drink, Sugar, Pizza i=3; d 1 = 10; n=3](image-2.png "")
Data ItemMatches withE after adding 2j to theith Row ofjth Column ofexisting value in ith columnJam2400+160Wheat bread110+2160Chicken152+32160Soft drink213416+20Dry fruits253418+320Sugar3334500+8Pizza3134508+2
2
3contains all the transactions in theencoded form. It will be these encoded values that willbe transferred across the network between the clientand the server.345010581602181618
3Encoding and Decoding Techniques for Distributed Data Storage Systemspizzasaucesugarsweetbunsoftfruit wheathoneyjamDry fruitsdrinkbreadwheatbunburgerbutter chicknbread
4a:i=2; d1 = 50; n=5
4b
4c
T © 2011 Global Journals Inc. (US)
© 2011 Global Journals Inc. (US)
*
Distributed Data Mining on the Grid
Wu-Shan
Ji-HuiJiang
Yu
Proceedings of the Fourth International Conference on Machine Learning and Cybernetics
the Fourth International Conference on Machine Learning and CyberneticsGuangzhou
August 2005
*
Exploring the capabilities of Mobile Agents in Distributed Data Mining
UPKulkarni
KKTangod
SRMangalwede
ARYardi
10th International Database Engineering and Applications Symposium (IDEAS'06)
IEEE
2006
*
A User's Guide to Principal Components
JEJackson
1991
John Wiley and Sons
New York
*
Introduction to Statistical Pattern Recognition
KFukunaga
1990
Academic Press Professional, Inc
San Diego,CA, USA
*
Supervised classification in high-dimensional space: geometrical, statistical,and asymptotical properties of multivariate data
LOJimenez
DALandgrebe
IEEE Transactions on Systems, Man and Cybernetics
28
1
1997