# Introduction loud Computing is a platform consists of a commonly very large number of computers responsible for data computing and storage. Such large computing re-sources are combined to serve multiple consumers using a multi holder model, with different physical and virtual resources dynamically assigned and reassigned according to the users' demands. Consider a cloud computing environment as a set of virtual machines (VMs) which may be assigned to different physical machines (PMs), different VMs may return different performances due to many factors, e.g., concurrency with other processes on the PMs. Cloud computing elasticity enables the system to provide and remove resources according to the application's needs in real-time. providing suitable cloud elasticity on demand is not a frivolous matter. A cloud computing environment is apt to several factors that may influence its performance, including different types of virtual systems provided by the service, different time zones, and demand variation .To providing cloud computing elasticity requires monitoring closely the system's demand for resources in order to decide when to add or to remove resources. When users purchase figure out time from a cloud provider, commonly both sides, the user and provider agree on the quality of service, via a service level agreement (SLA), which may be composed of the following parameters: Revenue: monetary value paid by the user to the provider for the computing time. Operating Costs: monetary value paid by the user to the provider for the computing resources allocated for processing the user's workload. Service Level Objective (SLO): It is associated with a user-defined metric which must be satisfied by the provider. e.g., response time, throughput, availability. Penalty: monetary value paid by the provider to the user for not satisfying the SLO. The advantage of cloud providers to accordingly monitor and scale their resources, e.g., VMs, in real-time as a function of the current workload, in order to lower their computing cost while satisfying all current SLAs and minimizing penalties as much as possible. This is the very problem we address. In this paper we aim at continuously monitoring a DBMS's performance and automatically minimizing the VMs used for query processing while minimizing potential SLO violations. We use query processing time as the contracted SLO metric. Our resource provisioning approach does not assume that the number of VMs is fixed, or that VMs yield the same performance, nor we assume we can conclude the workload beforehand. Using our elastic solution to ensure those, the immigrate process of applications to the cloud can be made directly. Our approach uses the relational data model and works with full reproduction. Thus, each set up VM has a DBMS with a complete copy of the database. Our solution does not partition the data, but the query. We apply virtual partitioning to divide the query into sub queries to be processed on allocated virtual machines. In that paper only select-range queries based on key attributes were investigated. This paper extends by showing how to address select-range queries on non-key attributes as well as aggregation queries. # Main contributions are : ? a non-intrusive, automatic and adaptive performance monitoring technique for DBMSs on the currently allocated VMs ? a pragmatic approach which dynamically aims at providing the smallest set of VMs capable of satisfying each query's SLO and thus the user's SLA in general. ( D D D D D D D D ) Year B One of the main advantages of the approach that we propose is that it may be easily applied to cases where the user already has its applications using relational database and ambition to deploy it in a cloud infrastructure, without the need of re architect ring the applications which are especially based on RDBMS technology. # II. # Related Work An adaptive approach for provisioning VMs for the use of a distributed stream processing systems (DSPS)in the cloud. The proposed provisioning algorithm uses a black box approach, i.e., it is independent of the specifics of the queries running in the DSPS. It scales the number of VMs used entirely based on measurements of input stream rates. It detects an overload condition when a decrease in the processing rate of input data occurs because of discarded data tuples due to load dropping. The algorithm is invoked periodically and calculates the new number of VMs that are needed to support the current workload demand however, the paper does not specify how often this algorithm is invoked. We do not focus only on resource provision, and our contribution is also an adaptive monitoring, and re provisioning if necessary, during querying processing. [1] The authors of proposed Kingfisher, a costaware system that tries to minimize the customer-centric cost, the cost of renting servers while meeting the application's SLA. It solves an integer linear program to account for both the infrastructure and transition cost deriving appropriate elasticity decisions under each workload change. Kingfisher uses a proactive approach to know when to provision as well as an ideal workload predictor that uses statistics gathered by the monitoring engine to derive estimates of future workload. Our approach differs from this approach in many ways: we use a cloud-provider-centric approach, and we do not assume a workload predictor, rather than the system's performance is continuously monitored. [2] The problem of provisioning resources in a public cloud to execute data analytic workloads is exited. The algorithm presented an lazes the space of possible structure for the input workload based on conclude costs of the structure. The algorithm tries to find a structure where resource costs are minimized while the SLA associated with the workload is met. The cost model presented is similar to ours. However, considering that the performance of a particular structure can therefore degrade and SLAs are violated, it might be necessary to change the resources allocated to the application. The paper does not present a dynamic provisioning as we have done in our work. [3] This paper presents an adaptive method to optimize the response time of range queries in a distributed database. The algorithm partitions and adaptively identifies the best level of parallelism for each query, since choosing the maximum level of parallelism is not necessarily the best strategy to optimize a query's performance. If a query is sent to too many storage hosts, it can saturate a single client by returning results faster than the client can consume them. That work, similar to ours, proposes an adaptive provisioning algorithm for range queries and considers possible variation in VM performances, but it differs from our work as it does not have an SLA to observe and does not specify how often the algorithm is invoked. [4] III. # Proposed Work # Problem Formulation Let TRQ denotes the estimated total time needed to process a query Q and let SLOQ be the agreed time to process a given query Q as per the SLA. If p is the cost, per unit of time, of failing to satisfy SLOQ, we can define the provider's penalty (cost) as: pp Q = max{(TR Q -SLO Q ) * p, 0}(1) Let c vm be the cost, per unit of time, for using a VM which contains a full model of a relational database and let n Q (t) be the number of VMs allocated to execute Q at time t, where time is discretized in billable units. Then, we can define the computing cost for running Q as: cc Q= ? (????(??) * ??????) ?????? ??=1 allocating as many VMs as possible will output an optimally minimum provider's penalty cost, but will increase its computing cost. Like wise , allocating a single VM will minimize the provider's computing cost, but will very likely output a potentially large penalty cost. given a query Q, we need to obtain, for each time point t, the number of VMs (n Q (t)) minimizing Q's cost (pp Q + cc Q ) This problem definition captures the elasticity of the cloud environment, namely that in different points of time a different number of VMs may be sufficient to process a query with respect to its SLO. # a) Query Definition A preliminary version of this paper appears in that paper only select-range queries based on Key attributes were investigated. A select-range query is defined as follows: where T1.pk denotes the primary key attribute of T1. SELECT * FROM We can also handle aggregation queries and aggregation over range: SELECT OPER (T1.attr) FROM table T1; where OPER is an aggregate operator (SUM, for example) and T1.attr denotes an attribute of a table T. # b) Motivating Example i. Select-Range Query Assume that the following single select-range queryQ with SLO Q is received by the cloud provider. SELECT * // <---Q FROM table T1 WHERE T.pk >= 0 and T.pk < 4000; where T1.pk is the primary key of table T1. Assuming that T's primary key has no gaps, Q s =4000 tuples. Further let us assume that SLO Q is 100 seconds and that our initial provisioning is one single machine vm 0 such that RR 0 = 30 and consequently NT 0 Q = 3000. Clearly, using only vm 0 will output a penalty to be paid by the provider, namely the cost of reading all the 4000 tuples, (TR Q -SLO Q ) * p = (150-100)*p = 50*p. It is wise then to bring another VM (vm 1 ) up, to help. Let us assume that RR 1 = 10 and NT 1 Q = 2000. At this point it may be seemed that those two VMs are enough to process Q and satisfy its SLO by rewriting Q into the following two (sub)queries, Q 1 and Q 2 , running the first on vm0 and the second on vm 1 , respectively. Note that we use virtual partitioning (i.e. partitioning the range over the primary key) to divide Q into Q 1 and Q 2 . SELECT * // <---Q1 FROM table T1 WHERE T1.pk >= 0 and T1.pk < 3000; SELECT * // <---Q2 FROM table T1 WHERE T1.pk >= 3000 and T1.pk < 4000; Our proposal to deal with this issue is to more partition queries so that the executing VMs' reading rate can be monitored often enough in such way that other VMs can be added as needed in order to enforce SLO Q . If monitoring is too constant that means the original queries would have to be partitioned in too many sub-queries, then the overhead added may hurt more than help. If hardly done it may be too late to make any corrections and to avoid potential penalties. To address the partitioning process we build on historical data, i.e., how long it took to process a selectrange query with a given selectivity using a certain number of partitions. We assume the existence of a table H for each VM where each entry has a 4-tuple: (Partitioning attribute, query's selectivity, number of partitions, average processing time). Using H we can find the maximum number of partitions and we can divide a query on, thus allowing an as-frequent-as-possible monitoring, while still satisfying SLO Q . For the sake of argumentation, let us assume that H has enough information so that the following entries can be found when the query's selectivity is set to 4000and the partitioning attribute is pk as follows: In this case, the larger number of partitions we can use is 4, maximizing the chance to monitor Q's performance while still satisfying SLOQ. Thus, we divide query Q1 into four queries: SELECT * // <---Q1, Even though it is not discussed here for the sake of brevity, a similar reasoning would be applied with respect to Q2 to be processed at vm1. This partitioning methodology using primary key as a partitioning attribute is the same for both selectrange queries and aggregation queries which are presented in the next subsection. Let us assume that ( D D D D D D D D ) Year B 2014 vm1's performance is stable though and it is able to finish its workload as planned. When Q1; 1 finishes we have the first opportunity to monitor, in a non-invasive manner, the VM's performance. Let us assume that it spent actually 40 seconds to finish. This leads us to reset the value of that VM's reading rate to RR 0 = 20 (500 tuples in 50 seconds), which leads to an expected completion time, for all five remaining sub-queries, of 200 seconds. This brings the expected completion time above SLOQ, and triggers a revision of the initial provisioning, so that SLAQ can still be satisfied. Note that before reviewing the initial provisioning, the three remaining partitions (Q1, 2, Q1, 3 and Q1, 4,Q1,5and Q1,6) are gathered in a single query. Our elastic system would be able to process Q as follows: vm 0 would be used from time 0 to110 in order to retrieve tuples satisfying the primary key range SELECT OPER(T.attr) // <---Q FROM table T1; The difference between this example and the previous one is how to partition an aggregation query. For partitioning Q, we added a range over the primary key of T1 which is used as the partitioning attribute. The range value in this case is between the maximum and minimum value for the primary key. In order to simplify, let the maximum value be 2999 and the minimum be zero. SELECT OPER(T1.attr) // <---Q FROM table T WHERE T1.pk >= 0 and T1.pk < 4000; We assume that the query's selectivity in this case is the range' selectivity over T.pk (Qs = 4000 tuples). If the aggregation operator is a distributive function as COUNT, for example, the result of Q is the summation of the results collected for each partition. However, if the aggregation operator is an algebraic function as AVG, the original query result may not be easily obtained through the partitions. We have to transform to distributive functions; in the case of AVG, we could transform on SUM and COUNT, so that the original query result can be easily obtained through the summation of partitions' results. If we consider the same scenario presented in the previous example, vm 0 has a reading rate RR 0 = 30 and consequently NT Q 0 = 3000. Using only vm 0 will yield a ((TR Q -SLO Q )*p = (150-100)*p = 50*p) to be paid by the provider. It is wise then to bring another VM (vm1) up, to help. Let us assume that RR 1 = 10 and NTQ1 = 2000. We rewrite Q into the two following (sub)queries, Q1 and Q2, running the first on vm 0 and the second on vm WHERE T1.pk >= 2500 and T1.pk < 3000; A similar thinking would be applied with respect to Q 2 to be run at vm1. As in the previous example, we assume that vm 1 's performance is stable yet and it is able to finish its workload as planned. After Q 1.1 finishes we have the first opportunity to monitor the VM's performance in a non-invasive manner. For the sake of brevity, we could suppose the same what happened in the previous example where Q1,1 spent 40 seconds to finish, therefore the VM's reading rate decreases to RR 0 = 20 and leads to an expected completion time, for all three remaining partitions (Q1,2, Q1,3 and Q1,4,Q1,5and Q1,6), of 200 seconds. Let us allocate a new VM (vm2) with RR2 = 40 to satisfy the SLOQ and to o²oad some query processing. Then all remaining 1000 tuples might be read by vm2 in 20 seconds, in the best case, which does not lead to a violation of SLOQ (recall that at this point 40 seconds have already been spent on Q1;1). We have the same elastic behaviour (two VMs between time [0,40], three VMs between time [40, 60] and again two VMs between time [60, 90]) presented in Fig. 1. The fact that a single basic approach can be easily adapted/ applied to different types of queries is a significant advantage. For instance, any optimization that can be done to the basic underlying (select-range) query can benefit other types of queries. SELECT OPER(T1.attr) // <---Q 1 ,1 FROM table T1 WHERE T1.pk >= 0 and T1.pk < 500; SELECT OPER(T1.attr) // <---Q1,2 # FROM table T1 WHERE T1.pk >= 500 and T1.pk < 1000; SELECT OPER(T1.attr) // <---Q1,3 this case is4, because it maximizes the chance to monitor Q's performance while still satisfying SLO Q . Thus, we divide query Q 1 into four queries: Step3: For each vm i that belongs to V do the following four steps i. Calculate Q i which is sub query of Q with NT i Q selectivity. ii. Calculate n i Q which is the query in H, the maximum number of partitions for Q i satisfying SLOQ . iii. Calculate P i i.e. divide Q i in n i Q partitions. iv. Call monitoring algorithm by passing Pi and SLO Q as input. Step4: Stop/end Partition Query algorithm describes how partitions are created from a query Q taking into account SLO Q , performance of all vm i that belongs to V and data table H. The working of this algorithm is as follows: For each vmi allocated to process Q, the partition engine rewrites Q into a sub query Q i which has selectivity equal to the number of tuples that vm i can read without violating SLO Q , i.e., NT i Q and then Q i is partitioned using data table H by constructing the largest set of partitions Pi to be processed by vmi, such that the average Pi processing time does not exceed SLO Q . How to obtain the value NT i Q for each vm i is discussed in the next section. After built Partitions, Algorithm 2 i.e. monitoring algorithm is applied inside each vmi. # b) VM's Monitoring Algorithm Step 1: Start/begin Step 2: Read the input elements P i , SLO Q which are obtained from partitioning algorithm. Step 3: P s := selectivity (P i ); Step 4: until P i ? 0 do the following steps i. Q := P i .remove( ); v. T q := T end -T start ; vi. T spent := T spent + Tq; vii. P s := Ps -|TP|; viii. T estimated := P s /RR i ; ix. TR Q := T spent + T estimated + TM i ; x. Check if TR Q -SLO Q > 0 then do the following two steps a. T remain := SLO Q -(T spent + TM i ); b. return make Decision (Vm i , P i , T remain ); else if TR Q -SLO Q < 0 then do the following two steps a. ST i :=SLO Q -TR Q ; b. HasSlackTime(ST i ); xi. remove VM(vm i ); Step 5: Stop/end. Monitoring algorithm, monitors query partition processing, is carried on at each virtual machine allocated to this process. The working of this algorithm is as follows: At each vmi, monitoring algorithm starts monitoring the VMs' reading rate and estimates how long it takes to finish all partitions in P i . For each partition P i , monitoring algorithm calculates the time spent to process the partition. Therefore, it is possible to know how many tuples of P i remain to be retrieved, and also the estimated time to do that. This estimated time is based on the VM's reading rate, i.e., RR i . If the estimated total time needed to process Q is greater than SLO Q , this indicates that SLO Q may be violated. Note that we also consider the time TM i spent to monitor vm i 's performance. Hence, continuing to process Pi only with vmi will yield a penalty. In order to use the remaining time to process the remaining partitions in Pi without violating SLO Q , we have to make some decision .At this point P i should be recomputed for vm i , using make decision algorithm. Monitoring algorithm should also work in the situation when the SLO Q could be satisfied faster than expected. In such case, we may use the slack time for processing other incoming query partitions or to reduce the number of VMs allocated. ST i represents the slack time value of vm i that can be further allocated without violating SLO Q . The VMs with slack time are used by provisioning algorithms, for reusing overloaded machine resources. The selectivity of each partition is calculated in accordance to the range selectivity of partitioning ? The partition engine uses table H and is responsible for partitioning the query aiming at respecting the query's SLO. ? The monitoring engine is executed within each VM vm i allocated to process a query Q and aims at making sure each VM keeps within the expected SLO. VMs can request" and offer" help. Both the partition engine and monitoring engine form the core of our approach. Step 4: Q s := Selectivity(Q); Step 5: Until Q s >0 ^ S? 0 do the following steps i. (vm i ,ST i ) :=S.remove( ); Recall that in Algorithm 2 we have to make some decision to continue query processing while ( D D D D D D D D ) Year B 2014 ii. Check a. if (ST i > SLO Q ) then NT i Q :=SLO Q * RR i ; b. else NTiQ :=ST i * RR i ; iii. Q s := Q s -NT i Q ; iv. V := V U {vm i }; Step 6: Until Qs >0 do the following steps i. Vm i :=new( ); ii. NT i Q :=SLO Q * RR i ; iii. Q s := Q s -NT i Q ; iv. V := V U {vm i }; Step 7: Call Partioning Query algorithm by passing the variables H, Q, SLO Q , V as input. Step 8: Stop/end. This algorithm implements the initial provisioning approach. The purpose of this algorithm is to compute the smallest set V of virtual machines (V = {vm 0 , ??,vm n }) that should be initially dedicated to processing Q while satisfying SLO Q . For each vm i allocated it is required to know the amount of Q's tuples that vm i can process without violating SLO Q i.e., NT i Q . NT i Q is computed agreeing to the vm i 's reading rate capacity (RR i) within SLO Q . It computes the Q's selectivity and computes the adequate amount of tuples (NT i Q ) that should be provided to each vm i that belongs to V .We can get the query's selectivity by using the query plan statistics presented by the using the query plan statistics presented by the Database Management Systems. Here we take the advantage of VM's slack time. This algorithm finds in S the largest set of vm i whose ST i can be devoted to processing Q. The choice of the largest set is to allow the provider to serve more clients using less computing resources (VMs) and meeting the SLA. The choice is made using a greedy method, following the order of S. Let S be an ordered set of pairs (vm i , ST i ), which contains slack time ST i for each vm i , such that there is only one element for each vm i and ST i is greater than 0 for all set elements. The set S is ordered by the descendent order of ST i value. This algorithm calculates Q's selectivity. If S =0, the algorithm removes the first vm i from S and computes the number of tuples vm i can retrieve. If ST i > SLO Q , then NT i Q is calculated based on SLO Q and RR i . Otherwise, NTQ i is computed based on ST i and RR i . The second loop in this Algorithm is responsible for distributing remaining tuples in Q to new VMs. When instantiating a new virtual machine vm i , the algorithm calculates its NT i Q . NT i Q is calculated using the VM's reading rate (RR i ) and the value of SLO Q . Algorithm 3 terminates when there is no tuples in Q to be distributed. After that, our strategy partitions Q (Algorithm 1 discussed in the previous subsection) is called to monitor the performance of each provisioned VM during Q execution. satisfying SLO. We therefore propose an elastic solution (described in Algorithm 4), which dynamically provisions VMs. At the monitoring stage, Algorithm 4 is called for each vmi that has the possibility of violating SLO Q . Suppose that vmi has a set of remaining partitions Pi, which should be processed within T remain units of time without violating the SLO Q . Algorithm 4 recalculates the number of VMs to help vm i to finish the processing of Pi within T remains, aiming at satisfying SLO Q . First, Algorithm 4 recalculates how many tuples vmi may retrieve while satisfying SLO Q . After that, Algorithm 4 provisions a new set of VMs and allocates to each of them an amount of tuples to be processed. Step 8: V := V U {vm i }; while Q's >0 ^ S ? 0 do the following steps i. (vm i ,ST i ) :=S.remove( ); ii. Check a. if (ST i > SLO Q ) then NT i Q :=Tremain* RR j ; b. else NT i Q :=ST j * RR j ; iii. Q's := Q's -NT j Q ; iv. V := V U {vm j }; # d) Decision Making Algorithm Step 9: Until Q's >0 do the following steps i. Vm j :=new( ); ii. NT j Q :=Tremain * RR j ; iii. Q's := Q's -NT j Q ; iv. V := V U {vm j }; Step 10: Call Partioning Query algorithm by passing the variables H, Q, Tremain, V as input. Step 11: Stop/end. At the monitoring stage, Algorithm 4 is called for each vmi that has the possibility of violating SLO Q . # c) Initial Dynamic Provisioning Algorithm Step 1: Start/begin Step 2: Read input elements Q, SLO Q . Step 3: V := ø ; attribute. Recall that in all queries we chose the primary key as the partitioning attribute. # Conclusion A cloud-based, non-intrusive, automatic and adaptive performance monitoring technique for DBMSs for select-range queries and aggregation queries, as well as an approach that dynamically minimizes the number of VMs needed to satisfy the queries' SLO. # VII. # Future Scope Future work will focus on queries that use joins. Future work is to study the other parameters that can be used beyond the reading rate of each VM, such as CPU and effective memory available in each VM. 0![2000], vm 1 would also be used from time 0 to 100 in order to retrieve tuples satisfying the range [3000, 4000], and vm 2 would be used from time 50 to70 to retrieve tuples in the range [2000, 3000]. The number of VMs used as a function of time to execute Q would require two VMs between time [0, 40], three VMs between time [40, 60] and again two VMs between time [60, 90] as we could see in Fig.1. On the 110thsecond, the VMs were deallocated.](image-2.png "[ 0 ,") 1![Figure 1: Variation in the number of nodes allocated by our approach.](image-3.png "Figure 1 :") ![Figure 2 : Architecture](image-4.png "") SELECT MIN (T1.pk) into Vs, MAX (T1.pk) into Vf FROMtable T1;SELECT MIN (T1.pk) into Vs, MAX (T1.pk) into Vf FROMtable T1;2014YearD D D D ) B(For instance scan queries:SELECT *FROM table Tcan be commonplace rewritten as the following twoqueries: © 2014 Global Journals Inc. (US) Global Journal of Computer Science and Technology © 2014 Global Journals Inc. (US) instantiating a new machine vm j , the algorithm calculates its NT j Q ' which is using the VM's reading rate (RR j ) and the value of Trema in. Algorithm 4 terminates when there are no more tuples of Q' to be distributed. After that, our partitions Q' (Algorithm 1 discussed in the previous subsection is called) to monitor the performance of each provisioned VM during Q' execution, trying to ensure the Q' execution with in Tremain. From Algorithm 4, the remaining partitions of P i are gathered into a new query Q'. Then NT i Q ' is figured, i.e., the amount of Q's tuples that vm i If there still are tuples to be processed in Q' and S = 0 we have to allocate new VMs. When Suppose that vmi has a set of remaining partitions P i , which should be processed within Tremain units of time without violating the SLO Q . Algorithm 4 recalculates the number of VMs to help vmi to finish the processing of P i within Tremain, aiming at satisfying SLO Q . First, Algorithm 4 recalculates how many tuples vmi may retrieve while satisfying SLO Q . After that, Algorithm 4 necessities a new set of VMs and allocates to each of them an amount of tuples to be processed. At the monitoring stage, Algorithm 4 is called for each vmi that has the possibility of violating SLO Q . * Towards non-intrusive elastic query processing in the cloud CoelhoDa Silva T L Nascimen M ADe Mace Do J A F F R CSousa J CMachado Proc. the 4th International Workshop on Cloud Data Management the 4th International Workshop on Cloud Data Management Oct. 29-Nov. 2, 2012 * A costaware elasticity provisioning system for the cloud USharma PShenoy SSahu AShaikh Proc. the 31stInternational Conference on Distributed Computing Systems the 31stInternational Conference on Distributed Computing Systems June 2011 * Provisioning data analytic workloads in a cloud RMian PMartin JVazquez-Poletti Future Generation Computer Systems 2013 * A generic auto provisioning framework for cloud databases JRogers OPapaemmanouil UCetintemel Proc. the 26 th IEEE International the 26 th IEEE International