ith the advent of cloud computing [6], parallel processing, the method of having many small tasks solve one large problem, has been receiving attention with ever increasing demand for higher performance, lower cost and sustained productivity. In general, solving a large distributed engineering problem has been facilitated by two major techniques: massively parallel processors and network distributed computing. Massively parallel processors [7] provide the most powerful environment under which high computational power can be attained, combining a few hundred to a few thousand CPUs connected to hundreds of gigabytes of memory. However, MPPs suffer from two drawbacks: economy and availability. On the other hand, distributed computing over networked cloud computing environment [3,10] is gaining its edge over MPPs because of its widespread availability of today's cloud computing with its economic advantage. Distributed computing provides well defined APIs easily applicable to various programs as well as effective compilers over a network of workstations, resulting in cost effective solutions. Despite its widespread acceptance as one the best candidates for distributed computing paradigms, distributed virtual machines over cloud computing environment are yet to be verified whether they can offer scalable solutions to solving the distributed engineering problems in efficient and economical ways. While it is believed that employing more virtual machines from the cloud would help to expedite the execution of distributed problems, it is not clear whether we can obtain overall performance increase proportional to the number of virtual machines employed in the cloud [2]. The objective of this study is not in any specific engineering problem itself but to perform a parametric numerical experiment on parallel virtual machines by solving typical engineering differential equations of which solution can be obtained by way of PVM over a cloud computing network. PVM (Parallel Virtual Machine) [1,8] is one of the most promising distributed computing systems available and may be applied to a network of heterogeneous workstations such as network of cloud computing. PVM provides easy to program APIs through which complex and CPU intensive scientific problems, such as global climate modeling and new drug design, can be solved without relying on expensive MPPs by decomposing them into a set of simple tasks manageable on a network of virtual machines of a cloud system. In this paper, the experience with PVM is presented in pursuit of finding a pseudo-optimal decomposition of PVM tasks of which set are assigned to individual virtual machines in a cloud system. It will only be a pseudo-optimal since various types of overhead are affected by nondeterministic factors such as actual data transfer rate and the network load at time of execution. We will investigate how total execution time of a finite difference program processed in parallel varies as number of homogeneous virtual machines on a network of a cloud system is varied along with the effects of discretization size in both space and time. Distributed computing using PVM may be approached from three fundamentally different viewpoints based on the organization of computing tasks [4]. The workload is not known a priori, for example as in recursive divide-and-conquer algorithms. The third model termed hybrid is a combination of the tree and the crowd model. As mentioned above, PVM provides various methods of task decomposition that fit into modeling of diverse scientific problems. The choice of model will be application dependent and should be selected to best match the natural structure of the distributed program while taking into account the communication overhead. For the purpose of studying the scalability of distributed virtual machine solutions, it is presented a PVM program that calculates heat diffusion in a thin wire by solving the finite difference version of a one dimensional heat equation. To accomplish the aim, as mentioned in the introduction, it suffices to choose to analyze a most simple problem preferably to which an analytic solution is also available. Consider a thin wire of length L, density ?, specific heat c and thermal conductivity k with the ends of the wire maintained at a fixed temperature of T e and an initial temperature profile of; T(z, ? = 0) = T e + T 0 ???(? ? ? )(1) With the conventional non-dimensionalization, where, A = (T-T e )/T 0 , and t = ?/L( ?? ? ), and x = z/L, the temperature in the wire is described by the following heat equation; ? 2 A/?x 2 = ?A/?t (2) With the initial and boundary conditions of; A(x, t = 0) = ???(??) (3a) A(x = 0, t) = A(x = 1, t) = 0 (3b) The exact solution of Eq. ( 2) subject to (3a) and (3b) can be found by the method of separation of variables [5] as; A(x, t) = ? ?? 2 ? sin(??)(4) Finite-difference solution to the above problem will be sought via parallel processing and in particular through distributed computing using the PVM program. We will adopt an explicit scheme with forward differencing in time and central differencing in space, and hence the solution can be obtained by marching in time from a given initial temperature distribution. If we denote A(x i ,t j ) as A i,j , temperature at position x i and at time t j+1 can be expressed as; A i,j+1 = ?(A i+1,j + A i-1,j ) +(1-2?)A i,j(5) where ? = Î?"t/(Î?"x) 2 and the stability criterion for the explicit scheme requires that ? ? 1/2. For our problem, where the solution over the whole space can be obtained through the same equation as provided by Eq. ( 5), natural choice of programming structure would be the master-slave method of the crowd programming paradigm where the slaves, spawned by the master program, perform the actual computations. We choose to divide the wire into 50 subsections and the solutions to each subsection will be obtained separately by each slave programs, although it will be required that the right most and the left most temperature information be exchanged with its right and left neighboring subsections. The workload of the slaves is allocated by the master through data decomposition whereby the initial temperature distribution of each subsection is sent to respective slaves. Overall structure of the parallel processing adopted in this paper is as follows: The master program spawns 50 copies of the same slave program, each of which handles a subsection of the wire. After receiving initial temperature distribution, each slave computes heat diffusion in the corresponding wire subsection. At each time step, each slave program needs to communicate boundary information with its left and right neighboring slaves. When a specified final time is reached, all 50 slave programs send its final temperature profile to the master, who then terminates the spawned slaves and ends the program. In order to study the scalability aspects, same program was executed on 1 to 60 separate processes on the virtual machines available over the network of 1Gbps Ethernet. When more than one virtual machine is utilized, the master and the slave programs are allocated to each machine such that an even distribution of workload is achieved. The finite difference PVM program was executed on six different configurations with four values of Î?"x ranging from 1/50 to 1/1000 and four Î?"t's for each Î?"x ranging from 2.0 x 10 -4 to 6.25 x 10 -8 until a preset final time is reached. These various processing configurations were set to investigate to find the pseudo-optimal number of virtual machines that preserve the scalability of the whole system. The final time was chosen such that for each different Î?"x, the largest Î?"t results in 750 iterations and the smallest 6000. Hence for a given Î?"x, total number of iterations on Eq. 5 is inversely proportional to the size of Î?"t. The final temperature profile for all of the above parameter values resulted in essentially the analytic solution of Eq. (4). The total computing time elapsed in carrying out this numerical experiment on a network of virtual machines were recorded and are analyzed from the perspective of whether the cloud computing environment could effectively provide scalable solutions to such highly distributed problems. Results analyzed in this section were obtained by running a group of PVM processes on a set of virtual machines running on VMware [9]. Total computing time, including data communication overhead between machines, will definitely depend on the degree of network traffic at the time of execution. Results used in the analysis below are only one out of many trials executed under similar working environment. # a) Data Communication Overhead To investigate the effect of data transfer over the network on the total computing time, time taken solely in data transfer between machines were investigated and the results are presented in Fig. 1. This includes time taken for PVM setup plus network latency and data transmission over the network of virtual machines. It can be seen that the increase in the amount of data will not necessarily result in a larger communication overhead but will be strongly dependent on the number of machines on the network. The behavioral aspects of communication overhead are inferred to be strongly dependent on the characteristics of how much distributed the particular problem and how much data are to be transferred between the machines over the network. However, what is important from the experimental observation is that the communication overhead does not diverge unbounded, but is rather bounded once the distributed solution is given. This observation justifies that virtual machines in a cloud computing environment can definitely provide scalable computational means to a certain types of distributed problems such as complicated engineering differential equations. The most important result is that for all cases, it seems to exist a pseudo-optimal configuration. This pseudo-optimum becomes more pronounced as computation load increases (see Fig. 2d). This can be explained by the fact that total computation time is comprised of actual CPU time, which decreases as more virtual machines are utilized and workload is more evenly distributed, and communication overhead, which increases with NP as more data transfers are required between more machines. Hence total computation time is the least when gains obtained from work distribution minus the increase in communication overhead is the largest. For our problem it can be concluded that the optimum configuration is obtained when 45 to 55 machines are utilized. That is, about 50 virtual machines are shown to be pseudo-optimal number of machines that our cloud system can provide virtual machines to our specific problem in scalable way, i.e., the total computing time is basically proportionally decreased as the number of virtual machines is increased to about 50. It should, however, be noticed that these pseudooptimal numbers of scalable virtual machines that deduced from our experiments should be different for other types of problems as other types of problems should demand different amount of resources such as CPU time and the amount of data communicated among the virtual machines. # b) Total Computation Time It should also be remembered that this analysis is based on the fact that the programs were executed where data transfer time is almost constant. In addition, we can conclude that our problem is such that communication overhead is of comparable order to actual CPU time, especially when computation burden is not low (e.g. for small Î?"x) and hence different types of configuration do play an important role regarding total computation time. The observations from the experiments fortifies the common belief that scalability is more prominent in a network of virtual machines where most of the computational powers of the machines are required to solve the problem than in a network where less powers are required. In this paper, a numerical computational experiment were performed over a virtual machine cluster connected by 1Gbps Ethernet using PVM to solve a one dimensional differential equation with the purpose of investigating how scalable the network of virtual machines are. When the data transfer time is relatively constant, there exists a pseudo-optimal task decomposition set by decrease in CPU time of individual virtual machines and increase in communication overhead as more machines are utilized thus preserving the scalability of the system. However, when the traffic is heavy with data transfer being more random, total computation time shows a certain degree of random ness with less systematic improvement as workload is distributed among more machines. However, it is clear that in most of the cases studied, distributed computation over the cloud of virtual machines is no worse than serial computation in the worst case and shows highly scalable improvement over serial computation in many other cases. 2012![Global Journals Inc. (US) Global Journal of Computer Science and Technology Volume XII Issue X Version I](image-2.png "W © 2012") 1![Fig.1: Communication time for data transfer However the results of Fig. 1 clearly show that data communication time is bounded. One can also deduce that the PVM setup time is about 100 ms for our experimental environment and the data transfer time exhibits some randomness reflecting the characteristics of some other uncontrollable glitches in the network over the execution of repeated computation.The behavioral aspects of communication overhead are inferred to be strongly dependent on the characteristics of how much distributed the particular problem and how much data are to be transferred between the machines over the network.However, what is important from the experimental observation is that the communication overhead does not diverge unbounded, but is rather bounded once the distributed solution is given.This observation justifies that virtual machines in a cloud computing environment can definitely provide scalable computational means to a certain types of](image-3.png "Fig. 1 :") 2![Fig.s 2 below describes the total computation time versus the number of machines utilized by the PVM program.The most important result is that for all cases, it seems to exist a pseudo-optimal configuration. This pseudo-optimum becomes more pronounced as computation load increases (see Fig.2d). This can be explained by the fact that total computation time is comprised of actual CPU time, which decreases as more virtual machines are utilized and workload is more evenly distributed, and communication overhead, which increases with NP as more data transfers are required between more machines. Hence total computation time is the least when gains obtained from work distribution minus the increase in communication overhead is the largest.(a) Î?"x = 0.02](image-4.png "Fig.s 2") 2![Fig.s 2 below describes the total computation time versus the number of machines utilized by the PVM program.The most important result is that for all cases, it seems to exist a pseudo-optimal configuration. This pseudo-optimum becomes more pronounced as computation load increases (see Fig.2d). This can be explained by the fact that total computation time is comprised of actual CPU time, which decreases as more virtual machines are utilized and workload is more evenly distributed, and communication overhead, which increases with NP as more data transfers are required between more machines. Hence total computation time is the least when gains obtained from work distribution minus the increase in communication overhead is the largest.(a) Î?"x = 0.02 Î?"t = 2.0 x 10 -4 , 1.0x10 -4 , 5.0x10 -5 , 2.5x10 -5 respectively](image-5.png "Fig. 2 :") ![, Q., and Boutaba, R., "Cloud Computing: state-of-the-art and research challenges," Journal of](image-6.png "") JulyAuthor : Department of Computer Engineering Hongik University. This work was supported by 2011 Hongik Research Fund. E-mail : hgkim@hongik.ac.kr © 2012 Global Journals Inc. (US)Global Journal of Computer Science and Technology July © 2012 Global Journals Inc. (US) * PVM: Parallel Virtual Machine -A User's Guide and Tutorial for Networked Parallel Computing AGeist 1994 MIT Press Cambridge, MA * Cloud Computing BHayes Communications of the ACM 51 7 July 2008 * KHwang GCFox JDongarra Distributed and Cloud Computing 1 Morgan Kaufman October 2011 st ed. * A Numerical on Network Parallel Computing Using PVM HGKim KJSeong SHKim Proc. Of KISS 23 2 1996 * GEMeyers Analytical Methods in Conduction Heat Transfer New York McGraw-Hill 1971 * MCloudMiller Computing 2008 QUE * The Design and Implementation of the Massively Parallel Processors based on the Matrix Architecture HNoda IEEE Journal of Solid State Circuits 42 1 2007