# INTRODUCTION Author ? : Senior System Architect, USA, Research Scholar, Department of CSE, ANU, India. e-mail: ramprakash.kota@gmail.com Author ? : Director, Sri Prakash College of Engineering (SPCE), Tuni, India. e-mail: dr.amjanshaik@gmail.com loss. The type of failure can be not having control on getting disk sectors corrupted or the entire disk is becoming unusable. The storage services have some self-protecting mechanism as extra-corrective information that can detect changing of few bits from the original data and can still retrieve the originally stored data. However there are situations when multiple bits change unexpectedly, then the self-protecting mechanism detects that as hardware failure and storage devices become un-usable. This situations lead to loss of data [1] [2]. To handle these types of anomalies, the storage systems depend on Erasure codes. The Erasure code deploys the mechanism of assured redundancy to overcome the failures. The most generalized way of implementing this mechanism is replication of data over multiple locations. The most popular and simplest is Redundant Array of Independent Disks or RAID. In that the most basic version of these implementations is RAID -1, where every data byte is stored in at least two parallel disks. This way the failure may not lead to loss of data as long as a replicated copy of the data is available. This mechanism is easy to achieve, however this leads to many other overhead factors like cost of storage. The storage cost should be at least double than the actual cost. Moreover in any case if both the storage device fails then the complete solution becomes unusable. In the other hand, there are more complex solutions under Erasure methodologies such as wellknown Reed-Solomon codes. Reed-Solomon code can overcome high level failures with little less extra storage. These codes provide high level of failure tolerance with reduced cost [3]. In communication systems the Erasure coding is similar to Error Correcting Codes or ECC. Here the Erasure coding solves the similar types of problems but addresses very different types of problems. In massage communication, the error is caused by changing bits of the data. Here is the different lie between Erasure and message communication as the location of the changing bits is unknown. Hence application of Erasure is restricted [3] [11] [12]. The rest of the work is organized such that in Section II we discuss the fault tolerance mechanisms for Non -Cloud but distributed storage systems, in Section he tremendous growth in cloud storage services and the fact that is has reached to a point where loss of data due to failure is expected. The real challenge is thrown to the designer of the storage solutions for cloud services to protect the data loss during failure. The core technology behind protecting data during loss is Erasure coding. Previous works demonstrates the use of Erasure coding for the last two decades. However the true understanding of Erasure and effective use of Erasure Coding is never been discussed based on different cloud service provider. Thus this leads to confusion in solution designer and developer community. Hence in this work we focus on fundamental understanding of Erasure Coding, Comparisons and analysis of Erasure performances on multiple cloud storage service providers [1]. The storage systems on cloud came a long way in terms of capacity and latency time improvement. All the storage hardware types are commonly failing to protect data during failures and unable to restrict data T III we realise the Reed Solomon Fault Tolerance mechanism, in Section IV we propose the Novel Erasure Coding based on Reed Solomon Fault Tolerance for Cloud Based Storage, in Section V discuss the Erasure Coding mechanisms for Cloud Storage Service Providers, in Section VI we produce the results obtained for the proposed scheme and in Section VII we conclude. # II. FAULT TOLERANCE MECHANISMS FOR NON -CLOUD DISTRIBUTED STORAGE The standard fault tolerance mechanism depends on the erasure codes [4]. The basic mechanism can be understood if we assume a collection of n disks are partitioned into k disks. Hence there will be m disks which will hold the coding information as 1 r n i i m n k < = = ? ? ?.Eq 1 Where r denotes number of k multiple of disks The basic interpretation of the erasure codes can be understood as each disk must hold a z bit word to represent the customer data. If we denote them with d then the total set of codes for k number of disks are considered as 1 2 3 , , .... k z z z z ?.Eq 2 Also we consider the codes stored on each every m disk with c, and then the total representation is considered as 1 2 3 , , .... k c c c c ?.Eq 3 The coding and the customer data should a linear combination and can be represented as 0 1,0 0 1, 1 1 1 2,0 0 2, 1 1 ,0 0 ,1 1 ( )k k k k m m m k k c a z a z c a z a z c a z a z ? ? ? ? ? ? = + + = + + = + + ?.Eq 4 The coefficients "a" are also z bit words. Encoding, therefore, Simply requires multiplying and adding words, and decoding involves solving a set of linear equations with Gaussian elimination or matrix inversion. Furthermore, we understand the most popular coding techniques here. # a) RAID-4 and RAID-5 The RAID -4 and RAID -5 [5] are the simplest form of the erasure codes explained in this work earlier. RAID -4 and RAID -5 differs from the basic framework as it employs different arrangements of data replication. The RAID is a modification to MDS code where m=1 and z=1. The basic coding depends on a bit noted as p, where 0 1 1 .. k p z z z ? = ? ? ? ?.Eq 5 In case of any bit changing, the XOR code will identify it for the surviving code. # b) Linux RAID-6 The Linux system RAID -6 [6] [9] is considered as additional support to RAID -4 and RAID -5 as it uses an alternative disk under the framework. This framework proposes an alternation to the MDS as considering the code to be stored in two disks as m=2. Hence the formulation is too simple by using an XOR code: 1 2 1 2 ... # 2( ) ... 2 ( ) k k k p z z z q z z z = ? ? ? = ? ? ? ?.Eq 6 Here the codes called p and q will be stored on alternative disks to ensure the Erasure code to protect the data loss. # c) Array Codes The framework is called Array code as it is implemented using r X n array of customer data. In this framework the customer data will be stored with the arrangements as Figure -1. The array code with the following parameters: k=4, m=2 (RAID-6), n = k+m = 6, r=4, z=1. # d) Non-MDS Codes The Non-MDS codes do not allow replication of m storage devices to achieve optimal fault tolerance. The replication of storage devices containing the code is higher than the other frameworks. However the efficiency provided by the Non-MDS codes compared to other frameworks in terms of performance is high. Hence we compare all the types of code frameworks here. # III. UNDERSTANDING REED-SOLOMON ERASURE The most important factor that makes Reed-Solomon framework to implement is the simplicity. Here in this work we consider the scenario to compare the performance of Reed -Solomon and Proposed Encoding technique [7] [8]. We consider there will be K storage devices each hold n bytes of data such that, Where D is the collection of storage devices Also there will be L storage devices each hold n bytes of check sum data such that, 1 2 3 , , .... L C C C C C = ? ?Eq 8 Where C is the collection of Checksum devices The checksum devices will hold the calculated values from each respective data storage devices. The goal is to restore the values if any device from the C collection fails using the non -failed devices. The Reed -Solomon deploys a function G in order to calculate the checksum for every device in C. Here for this study we understand the example of the calculation with the values as K = 8 and L = 2 for the devices C 1 and C 2 with G 1 and G 2 respectively. The core functionalities of Reed -Solomon is to break the collection of storage devices in number of words. Here in this example we understand the each number of words is of u bits randomly. Hence the words in each device can be assumed as v, where v is defined as 8 1 ( ). . bits word v nbytes byte u Bits ? ? ? ? = ? ? ? ? ? ? ? ? ? Eq 9 Furthermore, v is defined as 8n V u = ?Eq10 Henceforth, we understand the formulation for checksum for each storage device as .( , , ... ) i i k C W D D D D = ?Eq 11 Where the coding function W is defined to operate on each word After the detail understanding of the Erasure fault tolerance scheme, we have identified the limitations of the applicability to the cloud storage services and propose the novel scheme for fault tolerance in this work in the next section. # IV. PROPOSED NOVEL FAULT TOLERANCE SCHEME With the understanding of the limitations of existing erasure codes to be applied on the cloud based storage systems as the complex calculations with erasure codes will reduce the performance of availability measures significantly. Thus we make an attempt to reduce the calculation complexities with simple mathematical operations in the standard erasure scheme. The checksum for storage devices are considered as C i from the Eq 11. We propose the enhancement as the following formulation for checksum calculation: 1 2 3 12 3 .( , , ... ) ( ... ) i i k i k C W D D D D W D D D D = = ? ? ? ?Eq 12 Here the XOR operation being the standard mathematical operation most suitable for logical circuits used in all standard hardware makes it faster to be calculated. Also we redefine the function to be applied on each word for the storage devices D as following: 1,1 1, ,1 , . . . . . . . . . . . . . . . . . . . . . L K K L K X L w w W w w ? ? ? ? ? ? ? ? = ? ? ? ? ? ? ? ? ?Eq 13 The proposed matrix will be stored on one of the devices and will be recalculated only once. As the modified checksum formulation is an XOR operation, thus which will automatically notify in case of any change. The comparative simulations is also performed in this work and the enhancement in the performance is also been exhibited. # V. ERASURE CODING MECHANISMS FOR CLOUD STORAGE SERVICE PROVIDERS As the most noted fault tolerance framework is the Erasure codes, hence we understand the application of Erasure codes on various cloud storage service providers [10]. # a) Erasure on Microsoft Windows Azure Microsoft Windows Azure employs a Local Reconstruction Code or LRC to be implemented using Reed -Solomon Code. The LRC is shorter code, which is robust and portable to implement and store. Here we understand the application framework in detail: We assume there are 6 data segments and 3 parity segments. Here the 3 parity segments are computed from 6 data segments stored in distinguished 9 disks. During failure any segment can be used for reconstruction. As the data and code is distributed over 9 segments, hence all the 9 segments need to be used for reconstruction. Azure define the cost of reconstruction is equal to number of data segments required for reconstruction. Hence in this case the total reconstruction cost is 6. However the main purpose of LRC is to reduce the reconstruction cost by calculating some of the codes from the local data segments. Hence to follow the same logic we have now 4 parity codes. Two of the parity codes are generated from all the data segments and should be kept globally. In the other hand the remaining two parity codes are computed from each storage data segment groups and should be kept locally [Figure :2]. # Figure 2 : LRC Computation Here the construction of LRC adds an additional parity code into the Reed -Solomon code. Hence it may appear as addition load on the computation, however this computation does not execute during the conventional tractions of data. # b) Erasure on Amazon S3 The basic implantation of fault tolerance of Amazon Simple Storage Service or S3 depends on the RAID framework. However rather than depending only on the storage providers, Amazon also recommends to employ application based fault tolerance mechanism. Hence this frame work should be considered as RAID -Application based framework. This is very much similar to Service Oriented Architecture or SOA model for RAID. The fault tolerance mechanism for Amazon S3 has three major components in the framework [Figure :3]: # c) Erasure on Google File Systems The File System in Google employs an essential high load data processing and storage solutions on public storage systems. The most crucial recovery factor relies on the Google's specific algorithms using constant monitoring, replication management, automatic and chunk recovery. Hence we understand that most of the cloud service providers use Erasure codes for their storage solutions with modifications leading to service and cost benefits. VI. # RESULTS The proposed fault tolerance scheme is been simulated and tested against the basic erasure fault tolerance scheme with the signal to noise ratio with Bit Error rate. The first simulation results is the basic erasure fault tolerance code shows the bit error rate for each signal to noise ranging from o to 15 decibel. The simulation results is also been generated using MATLAB simulation to observe the improvement 1![Figure 1: Array Code Storage](image-2.png "Figure 1 :") 3![Figure 3 : Raid Soa](image-3.png "Figure 3 :") 4![Figure 4 : Comparative Simulation VII. CONCLUSION This work compares the standard fault tolerance mechanisms for non-cloud based distributed storage solutions [9] [11] [12]. The work majorly focuses on RAID-4, RAID-5, Linux RAID-6, Array Codes and finally the Non -MDS Codes and realise the need for Erasure based codes for optimal performance. Also this work defines the parameters influencing the performance of Erasure codes in detail. Furthermore the work proposed an optimal cloud based fault tolerance code based on Erasure and evaluates the performance on multiple commercial cloud based storage solutions like Microsoft Azure, Amazon S3 and Finally Good File System. The simulation of the proposed fault tolerance scheme demonstrates up to 59% improvement in Bit Error Rate using the MATLAB simulation.](image-4.png "Figure 4 :") ISignal to Noise RationBit Error Rate0 Decibel0.3645 %1 Decibel0.3362 %2 Decibel0.3037 %3 Decibel0.2674 %4 Decibel0.2280 %5 Decibel0.1868 %6 Decibel0.1458 %7 Decibel0.1070 %8 Decibel0.0728 %9 Decibel0.0452 %10 Decibel0.0250 %11 Decibel0.0120 %12 Decibel0.0049 %13 Decibel0.0016 %14 Decibel0.0004 %15 Decibel0.0001 %The second simulation results in the proposed erasure based fault tolerance scheme [Table:II] shows the bit error rate for each signal to noise ranging from o to 15 decibel. IISignal to Noise RationBit Error Rate0 Decibel0.17310 %1 Decibel0.16220 %2 Decibel0.14940 %3 Decibel0.13490 %4 Decibel0.11850 %5 Decibel0.10060 %6 Decibel0.08160 %7 Decibel0.06210 %8 Decibel0.04290 %9 Decibel0.02530 %10 Decibel0.01190 %11 Decibel0.00410 %12 Decibel0.00100 %13 Decibel0.00010 %14 Decibel0.00000 %15 Decibel0.00000 % IBasic ErasureProposedImprovemenSchemeSchemet PercentageBit Error RateBit Error Rate(%)(%)0.36450.1731047.5 %0.33620.1622048.2 %0.30370.1494049.2 %0.26740.1349050.4 %0.22800.1185052.0 %0.18680.1006053.9 %0.14580.0816056.0 %0.10700.0621058.0 %0.07280.0429058.9 %0.04520.0253056.0 %0.02500.0119047.6 %0.01200.0041034.2 %0.00490.0010020.4 %0.00160.000106.3 %0.00040.00000100.0 %0.00010.00000100.0 % © 2016 Global Journals Inc. (US) ( ) b ## Global Journals Inc. (US) Guidelines Handbook 2016 www.GlobalJournals.org * Failure Trends in a Large Disk Drive Population EPinheiro WDWeber LABarro So Proc. USENIX Conf. File and Storage Technologies USENIX Conf. File and Storage Technologies Feb. 2007 7 * Disk Failures in the Real World: What Does an MTTF of 1 BSchroeder GAGibson 000 0 * Hours Mean to You? Proc. USENIX Conf. File and Storage Technologies (FAST ',07) USENIX Conf. File and Storage Technologies (FAST ',07) Feb. 2007 * Erasure coding in Windows Azure storage CHuang HSimitci YXu AOgus BCalder PGopalan JLi SYekhanin Proc. USENIX Annu. Tech. Conf USENIX Annu. Tech. Conf 2012 * Leveraging endpoint flexibility in data-intensive clusters MChowdhury SKandula IStoica Proc. ACM SIGCOMM Conf ACM SIGCOMM Conf 2013 * A New Intra-Disk Redundancy Scheme for High-Reliability RAID Storage Systems in the Presence of Unrecoverable Errors ADholakia EEleftheriou X.-YHu IIliadis JMenon KKRao ACM Trans. Storage 4 1 May 2008 * Minimum density RAID-6 codes JPlank ABuchsbaum BVander Zanden ACM Trans. Storage 6 4 2011 * Rethinking erasure codes for cloud file systems: Minimizing I/O for recovery and degraded reads OKhan RBurns JSPlank WPierce CHuang Proc. 10 th USENIX Conf. File Storage Technol 10 th USENIX Conf. File Storage Technol 2012 * XORing elephants: Novel erasure codes for big data MSathiamoorthy MAsteris DPapailiopoulos AGDimakis RVadali SChen DBorthakur Proc. VLDB Endowment VLDB Endowment 2013 6 * A costbased heterogeneous recovery scheme for distributed storage systems with RAID-6 codes YZhu PLee LXiang YXu LGao Proc. IEEE 42 nd Annu. Int. Conf. Dependable Syst. Netw IEEE 42 nd Annu. Int. Conf. Dependable Syst. Netw 2012 * Windows Azure storage: A highly available cloud storage service with strong consistency BCalder JWang AOgus NNilakantan ASkjolsvold SMckelvie YXu SSrivastav JWu HSimitci JHari Das CUddaraju HKhatri AEdwards VBedekar SMainali RAbbasi AAgarwal MFUlhaq MIUlhaq DBhardwaj SDayan ,AAdusumilli MMc Nett SSankaran KManivannan LRigas Proc. 23 rd ACM Symp. Operating Syst. Principles 23 rd ACM Symp. Operating Syst. Principles 2011 * CORE: Augmenting regenerating-coding-based recovery for single and concurrent failures in distributed storage systems RLi JLin PPLee Proc. IEEE 29 th Conf. Mass Storage Syst IEEE 29 th Conf. Mass Storage Syst 2013 Technol. * Single disk failure recovery for X-code-based parallel storage systems SXu RLi PLee YZhu LXiang YXu JLui IEEE Trans. Comput 63 4 2014