# Introduction arallel processing is the simultaneous use of more than one processor to execute a program in order to get faster results. Given an directed acyclic graph (DAG), also called task graph, in which the nodes represent the tasks and edges represent the communication costs as well as the dependencies among the tasks. The problem deals with the scheduling of the tasks onto a set of homogenous processors to minimize the completion time. DAG is generic model of a parallel program consisting of a set of processes. Each process is an indivisible unit of execution, expressed by node. A node has one or more inputs and can have one or more output to various nodes. The paper is organized as follows. In the next section, we describe the generic DAG model and its suitability to different situations. In section 3, basic scheduling attributes are being discussed. Classification of BNP scheduling algorithms is given in section 4. Section 5, presents a performance comparison of various BNP scheduling algorithms and results are derived. Last section concludes the paper and presents the scope of this work in future. # II. Dag model The DAG [Kaur et al, 2011][Ahmad and Kwok,1998] is generic model of a parallel program consisting of a set of processes among which there are dependencies. Each process is an indivisible unit of execution, expressed by node. A node has one or more inputs and can have one or more output to various nodes. When all inputs are available, the node is triggered to execute. After its execution, it generates its output. In this model, a set of nodes { n 1 , n 2 , n 3 ???. n n } are connected by a set of directed edges, which are represented by (n i , n j ) where n i is called the Parent node and n j is called the child node. A node without parent is called an Entry node and a node without child called an Exit node. The weight of a node, denoted by w (n i ), represents the process execution time of a process. Since each edge corresponds to a message transfer from one process to another, the communication time, denoted by c (n i , n j ) is equal to the message transmission time from node n i to n j . Thus c (n i , n j ) becomes zero when and are scheduled to the same processor because intraprocessor communication time is negligible compared with the interprocessor communication time. The node and edge weights are usually obtained by estimations. Some variations in the generic DAG model are described below: Accurate Model [Kaur et al, 2011]: In an accurate model, the weight of a node includes the computation time, the time to receive messages before the computation, and the time to send messages after the computation. The weight of an edge is a function of the distance between the source and destination nodes, and therefore, depends on the node allocation and network topology. It also depends on network contention which can be difficult to model. When two nodes are assigned to a single processor, the edge weight becomes zero, so as the message receiving time and sending time. Approximate Model 1 [Kaur et al, 2011]: In this model, the edge weight is approximated by a constant, independent of the message transmission distance and network contention. A completely connected network without contention fits this model. Approximate Model 2 [Kaur et al, 2011]: In this model, the message receiving time and sending time are ignored in addition to approximating the edge weight by a constant. These approximate models are best suited to the following situations; (i) the grain-size of the process is much larger than the message April n j n i receiving time and sending time; (ii) communication is handled by some dedicated hardware so that the processor spends insignificant amount of time on communication; (iii) the message transmission time varies little with the message transmission distance, e.g., in a wormhole or circuit switching network; and (iv) the network is not heavily loaded. In general, the approximate models can be used for medium to large granularity, since the larger the process grain-size, the less the communication, and consequently the network is not heavily loaded. The second reason for using the approximate models is that both the node and edge weights are obtained by estimation, which is hardly accurate. Thus, an accurate model is useless when the weights of nodes and edges are not accurate. # III. List scheduling Most scheduling algorithms are based on list scheduling technique [Kwok and Ahmad, 1999]. List scheduling is a class of scheduling heuristics in which the nodes are assigned priorities and placed in a list arranged in a descending order of priority. The node with higher priority will be examined for scheduling before a node with a lower priority. If more than one node has the same priority, ties are broken using some method. List scheduling consists of two phases: 1. Task prioritizing phase: -In this phase the priority of each node in DAG is computed and assigned. 2. Processor selection:-Each task is assigned processor with minimum execution time. The two main attributes [Hagras and Janeek, 2003] for assigning priority are the t-level (top level) and b-level (bottom level). Top level : The t-level of a node n i is the length of the longest path from an entry node to n i (excluding n i ). Here, the length of a path is the sum of all the node and edge weights along the path. The t-level is computed recursively by traversing the DAG downward starting from the entry node n entry . # t-level (n i ) = max (t-level (n m ) + w m + c m , i ) where n m is predecessors of n i , w m stands for computational cost, c m , i stands for communication cost and t-level (n entry ) = 0. The t-level of n i highly correlates with n i 's earliest start time, denoted by EST (n i ) , which is determined after n i is scheduled to a processor. Bottom level: The b-level of a node n i is the length of the longest path from node n i to an exit node. The b-level is computed recursively by traversing the DAG upward starting from the exit node n exit . # b-level (n i ) = w i + max(b-level (n m ) + c m , i ) where n m is successor of n i , w m stands for computational cost, c m , i stands for communication cost and b-level (n exit ) = w (n exit ). The b-level of a node is bounded by the length of the critical path. A critical path (CP) of a DAG, is the longest path from an entry node to an exit node. Static b-level: Some BNP scheduling algorithms do not consider the edge weights in computing the blevel. In that case, b-level does not change throughout the scheduling process, therefore it is called static blevel or static level (SL). SL(n i ) = w i + max (SL(n m ) ) where n m is successor of n i and SL(n exit ) = w (n exit ) ALAP start time: The ALAP (As-Late-As-Possible) start time of a node is measure of how far the node's start time can be delayed without increasing the schedule length. It is also known as latest start time (LST). LST(n i ) = min(LST(n m ) -c m , i ) -w i where is successor of n i , w m stands for computational cost, c m , i stands for communication cost and LST(n exit ) = EST(n exit ). Dynamic Level: It is the difference of Static level and Earliest Start Time. Some algorithms assign higher priority to a node with smaller t-level while some algorithms assign higher priority to a node with larger b-level. A priority table is designed for all the nodes in DAG. # IV. # Classification of bnp scheduling algorithms BNP refers to Bounded Number of Processor (BNP) Scheduling Algorithms [Hagras and Janeek, 2003] [Kaur et al, 2011]. These algorithms schedule the DAG to a bounded number of processors directly. The processors are assumed to be fully connected. BNP scheduling algorithms are based on the list scheduling technique in which nodes are assigned some priorities. To study these algorithms, homogeneous environment is considered in which processors having same configuration are used for execution. BNP class of algorithms is categorized into two categories: Static Algorithms: These algorithms use list scheduling approach. Therefore in static algorithms once the task prioritization phase is finished then and only then the processor selection phase begins. Following are static scheduling algorithms. Highest Level First with Estimated Times (HLFET) algorithm [Kwok and Ahmad, 1999]: It is one of the simplest list scheduling algorithms that uses static b-level as node priority and ignores the communication costs on the edges. Following steps describe the HLFET algorithm in detail: 1. Calculate the static b-level of each node. 2. Make a ready list in descending order of static blevel. The ready list contains only the entry nodes initially. Ties are broken randomly. Repeat. 3. Schedule the first node in the ready list to a processor that allows the earliest execution, using the non-insertion approach. 4. Update the ready list by inserting the nodes that are now ready. Until all nodes are scheduled. Modified Critical Path (MCP) algorithm [Kwok and Ahmad, 1999]: This algorithm uses an attribute called ALAP time of a node as a priority. The ALAP time of a node is computed by first computing the length of CP and then subtracting the b-level of the node from it. Therefore, the ALAP times of the nodes on the CP are just their t-levels. Following steps describe the algorithm. 1. Compute the ALAP time of each node. 2. For each node, create a list which consists of the ALAP times of the node itself and all its children in a descending order. 3. Sort these lists in an ascending order. Create a node list according to this order. Repeat. 4. Schedule the first node in the node list to a processor that allows the earliest execution, using the insertion approach. 5. Remove the node from the node list. Until the node list is empty. Dynamic Algorithms: These algorithms also use list scheduling approach. In Dynamic algorithms both the task prioritization phase and processor selection phase goes on side by side. Following are dynamic scheduling algorithms. The Earliest Time First (ETF) algorithm [Kwok and Ahmad,1999]: It computes, at each step, the earliest start times for all ready nodes and then selects the one with the smallest start time, which is computed by examining the start time of the node on all processors exhaustively. The algorithm is described below. 1. Compute the static b-level of each node. 2. Initially, the pool of ready nodes include only the entry nodes. Repeat. # Calculate the earliest start time on each processor for each node in the ready pool. Pick the nodeprocessor pair that gives the earliest time using the non insertion approach. Ties are broken by selecting the node with a higher static b-level. Schedule the node to the corresponding processor. 4. Add the newly ready nodes to the ready node pool. Until all nodes are scheduled. Dynamic Level Scheduling (DLS) algorithm [Kwok and Ahmad, 1999]: This algorithm uses as node priority an attribute called dynamic level (DL) which is the difference between the static b-level of a node and its earliest start time on a processor. The stepwise description of the algorithm is given below. 1. Calculate the b-level of each node. 2. Initially, the ready node pool includes only the entry nodes. Repeat. 3. Calculate the earliest start time for every node on each processor. Hence, compute the DL of every node processor pair by subtracting the earliest start time from the node's static b-level. 4. Select the node processor pair that gives the largest DL. Schedule the node to the corresponding processor. 5. Add the newly ready nodes to the ready pool. Until all nodes are scheduled. V. # Performance comparison and results The performance is the most important factor in every algorithm [Kaur et al, 2011] [Hagras andJaneek, 2003]. In this section, we present performance comparison of above discussed BNP scheduling algorithm. The performance comparison is based upon various comparison metrics discussed below. Makespan: Makespan is defined as the completion time of the algorithm. It is calculated by measuring the finishing time of the exit task by the algorithm. Speed Up: The Speed Up value is computed by dividing the sequential execution time by the parallel execution time. Scheduled length ratio (SLR): It is defined as the ratio of the Makespan of the algorithm to Critical path values of the DAG. Processor Utilization: It means that how processors are being utilized by different processes. It is good when maximum number processors are utilized. Above metrices are compared for 10 nodes, 15 nodes, 20 nodes and 25 nodes in homogeneous environment and results are shown graphically. Case 1 : In first case, results are shown for 10 nodes and 5 processors. Makespan and SLR is same for HLFET, MCP and ETF, but DLS algorithm shows highest makespan value and lowest speed up value. All processors are best utilized in case of HLFET and MCP. # VI. Conclusion and future scope Makespan of MCP and ETF showed large increase in value while increasing the tasks from 20 to 25 compared to other algorithms. The average processor utilization remained same for HLFET and MCP algorithms with 10 tasks. MCP utilized processor efficiently than HLFET with 15 tasks. With 20 and 25 tasks, HLFET proved to be better than other algorithms. The SLR remained almost the same for HLFET, MCP and ETF with 10 tasks. It is maximum in case of DLS for 10 tasks. With 15 tasks MCP shows the lowest value. As the tasks are increased HLFET shows the lesser value in case of 20 and 25 tasks. Same is the case with Speed Up. With 10 tasks speedup of HLFET, MCP and DLS algorithms is same. As the tasks increase from 20 to 25, Speed Up value of HLFET hikes. It can be concluded from the above results, that HLFET is one of the efficient algorithms, considering the data gathered using the scenarios and the performance calculated from them. The thesis has vast future scope. A lot of work can be done considering more case scenarios. Heterogeneous environment can be considered, in which multiple processors having different configuration are used. The comparison of these algorithms can be done for any 1![Figure 1: Makespan values for 10 nodes](image-2.png "Figure 1 :") 2345![Figure 2 : SLR and SpeedUp for 10 nodes](image-3.png "Figure 2 :Figure 3 : 4 :Figure 5 :") 7![Figure 7 : Makespan for 20 nodes Figure 8 : SLR and SpeedUp for 20 nodes](image-4.png "Figure 7 :") 911![Figure 9 : Processor Utilization for 20 nodes Figure 10 : akespan for 25 nodes](image-5.png "Figure 9 :Figure 11 :") 12![Figure 12 : Processor Utilization for 25 nodes](image-6.png "Figure 12 :") © 2012 Global Journals Inc. (US) Global Journal of Computer Science and Technology Volume XII Issue VIII Version I © 2012 Global Journals Inc. (US) * Analysis, Comparison and Performance Evaluation of BNP Scheduling Algorithms in Parallel Processing ParneetKaur DheerendraSingh GurvinderSingh NavneetSingh International Journal of Information Technology and Knowledge Management 4 1 January-June 2011 * Performance Comparison of Algorithm for Static Scheduling of DAG to multiprocessor" citeseerx IshfaqAhmad Yu-KwongKwok ist.psu.edu/view doc/download?doi=10.1.1.42.8979...pdf * Benchmarking and comparison of the Task Graph Scheduling algorithms IshfaqAhmad Yu-KwongKwok 1998 IEEE * Analysis, Evaluation and Comparison Of algorithm for Scheduling Task Graph on Parallel Processor IshfaqAhmad Min-YouWu 1996 IEEE * Static Vs. Dynamic Listscheduling Performance Comparison THagras JJaneek Acta Polytechnica 43 6 2003 * Parallel Job Scheduling: Issues and Approaches GDror LarryFeitelson Rudolph * On parallelization of Static Scheduling Algorithm Min YouWu IEEE 23 Aug 1997 * A performance study of multiprocessor task scheduling algorithm ShyiyuanJin GuySchiavone DamlaTurgut Jan -2008 43 * A taxonomy of Scheduling in General-Purpose Distributed Computing Systems ThomasLCasavant IEEE Transactions on Software Engineering 14 2 February 1988 * A Comparison of List scheduling for Parallel TAdam KChandy JDickson * Static Scheduling Algorithms for Allocating Directed Task Graphs to Multiprocessors Yu-KwongKwok IsfaqAhmad ACM Computing Surveys 31 4 December 1999 * Fast Scheduling and Partitioning algorithm in the multiprocessor system with Redundant Communication Resources ErykLaskowski * Scheduling Algorithm and Support Tools for parallel System IgorGrudenic * An Approach for parallel job Scheduling Using supple Algorithm SSudha KTjanushkodi Asian Journal of Information Technology 7 2008 * A Comparison of General Approaches to Multiprocessor Scheduling Jing-ChiouLiou MichealAPalis IEEE 15 April 1997