# Introduction

arallel processing is the simultaneous use of more than one processor to execute a program in order to get faster results. Given an directed acyclic graph (DAG), also called task graph, in which the nodes represent the tasks and edges represent the communication costs as well as the dependencies among the tasks. The problem deals with the scheduling of the tasks onto a set of homogenous processors to minimize the completion time. DAG is generic model of a parallel program consisting of a set of processes. Each process is an indivisible unit of execution, expressed by node. A node has one or more inputs and can have one or more output to various nodes.

The paper is organized as follows. In the next section, we describe the generic DAG model and its suitability to different situations. In section 3, basic scheduling attributes are being discussed. Classification of BNP scheduling algorithms is given in section 4. Section 5, presents a performance comparison of various BNP scheduling algorithms and results are derived. Last section concludes the paper and presents the scope of this work in future.


# II. Dag model

The DAG [Kaur et al, 2011][Ahmad and Kwok,1998] is generic model of a parallel program consisting of a set of processes among which there are dependencies. Each process is an indivisible unit of execution, expressed by node. A node has one or more inputs and can have one or more output to various nodes. When all inputs are available, the node is triggered to execute. After its execution, it generates its output. In this model, a set of nodes { n 1 , n 2 , n 3 ???. n n } are connected by a set of directed edges, which are represented by (n i , n j ) where n i is called the Parent node and n j is called the child node. A node without parent is called an Entry node and a node without child called an Exit node. The weight of a node, denoted by w (n i ), represents the process execution time of a process. Since each edge corresponds to a message transfer from one process to another, the communication time, denoted by c (n i , n j ) is equal to the message transmission time from node n i to n j . Thus c (n i , n j ) becomes zero when and are scheduled to the same processor because intraprocessor communication time is negligible compared with the interprocessor communication time. The node and edge weights are usually obtained by estimations. Some variations in the generic DAG model are described below:

Accurate Model [Kaur et al, 2011]: In an accurate model, the weight of a node includes the computation time, the time to receive messages before the computation, and the time to send messages after the computation. The weight of an edge is a function of the distance between the source and destination nodes, and therefore, depends on the node allocation and network topology. It also depends on network contention which can be difficult to model. When two nodes are assigned to a single processor, the edge weight becomes zero, so as the message receiving time and sending time.

Approximate Model 1 [Kaur et al, 2011]: In this model, the edge weight is approximated by a constant, independent of the message transmission distance and network contention. A completely connected network without contention fits this model.

Approximate Model 2 [Kaur et al, 2011]: In this model, the message receiving time and sending time are ignored in addition to approximating the edge weight by a constant. These approximate models are best suited to the following situations; (i) the grain-size of the process is much larger than the message April n j n i receiving time and sending time; (ii) communication is handled by some dedicated hardware so that the processor spends insignificant amount of time on communication; (iii) the message transmission time varies little with the message transmission distance, e.g., in a wormhole or circuit switching network; and (iv) the network is not heavily loaded. In general, the approximate models can be used for medium to large granularity, since the larger the process grain-size, the less the communication, and consequently the network is not heavily loaded. The second reason for using the approximate models is that both the node and edge weights are obtained by estimation, which is hardly accurate. Thus, an accurate model is useless when the weights of nodes and edges are not accurate.


# III. List scheduling

Most scheduling algorithms are based on list scheduling technique [Kwok and Ahmad, 1999]. List scheduling is a class of scheduling heuristics in which the nodes are assigned priorities and placed in a list arranged in a descending order of priority. The node with higher priority will be examined for scheduling before a node with a lower priority. If more than one node has the same priority, ties are broken using some method. List scheduling consists of two phases: 1. Task prioritizing phase: -In this phase the priority of each node in DAG is computed and assigned. 2. Processor selection:-Each task is assigned processor with minimum execution time.

The two main attributes [Hagras and Janeek, 2003] for assigning priority are the t-level (top level) and b-level (bottom level).

Top level : The t-level of a node n i is the length of the longest path from an entry node to n i (excluding n i ). Here, the length of a path is the sum of all the node and edge weights along the path. The t-level is computed recursively by traversing the DAG downward starting from the entry node n entry .


# t-level (n
i ) = max (t-level (n m ) + w m + c m , i )
where n m is predecessors of n i , w m stands for computational cost, c m , i stands for communication cost and t-level (n entry ) = 0. The t-level of n i highly correlates with n i 's earliest start time, denoted by EST (n i ) , which is determined after n i is scheduled to a processor.

Bottom level: The b-level of a node n i is the length of the longest path from node n i to an exit node.

The b-level is computed recursively by traversing the DAG upward starting from the exit node n exit .


# b-level (n
i ) = w i + max(b-level (n m ) + c m , i )
where n m is successor of n i , w m stands for computational cost, c m , i stands for communication cost and b-level (n exit ) = w (n exit ).

The b-level of a node is bounded by the length of the critical path. A critical path (CP) of a DAG, is the longest path from an entry node to an exit node.

Static b-level: Some BNP scheduling algorithms do not consider the edge weights in computing the blevel. In that case, b-level does not change throughout the scheduling process, therefore it is called static blevel or static level (SL). SL(n i ) = w i + max (SL(n m ) )

where n m is successor of n i and SL(n exit ) = w (n exit ) ALAP start time: The ALAP (As-Late-As-Possible) start time of a node is measure of how far the node's start time can be delayed without increasing the schedule length. It is also known as latest start time (LST).

LST(n i ) = min(LST(n m ) -c m , i ) -w i where is successor of n i , w m stands for computational cost, c m , i stands for communication cost and LST(n exit ) = EST(n exit ).

Dynamic Level: It is the difference of Static level and Earliest Start Time.

Some algorithms assign higher priority to a node with smaller t-level while some algorithms assign higher priority to a node with larger b-level. A priority table is designed for all the nodes in DAG.


# IV.


# Classification of bnp scheduling algorithms

BNP refers to Bounded Number of Processor (BNP) Scheduling Algorithms [Hagras and Janeek, 2003] [Kaur et al, 2011]. These algorithms schedule the DAG to a bounded number of processors directly. The processors are assumed to be fully connected. BNP scheduling algorithms are based on the list scheduling technique in which nodes are assigned some priorities. To study these algorithms, homogeneous environment is considered in which processors having same configuration are used for execution. BNP class of algorithms is categorized into two categories:

Static Algorithms: These algorithms use list scheduling approach. Therefore in static algorithms once the task prioritization phase is finished then and only then the processor selection phase begins. Following are static scheduling algorithms.

Highest Level First with Estimated Times (HLFET) algorithm [Kwok and Ahmad, 1999]: It is one of the simplest list scheduling algorithms that uses static b-level as node priority and ignores the communication costs on the edges. Following steps describe the HLFET algorithm in detail: 1. Calculate the static b-level of each node. 2. Make a ready list in descending order of static blevel. The ready list contains only the entry nodes initially. Ties are broken randomly.

Repeat.

3. Schedule the first node in the ready list to a processor that allows the earliest execution, using the non-insertion approach. 4. Update the ready list by inserting the nodes that are now ready. Until all nodes are scheduled.

Modified Critical Path (MCP) algorithm [Kwok and Ahmad, 1999]: This algorithm uses an attribute called ALAP time of a node as a priority. The ALAP time of a node is computed by first computing the length of CP and then subtracting the b-level of the node from it. Therefore, the ALAP times of the nodes on the CP are just their t-levels. Following steps describe the algorithm. 1. Compute the ALAP time of each node. 2. For each node, create a list which consists of the ALAP times of the node itself and all its children in a descending order. 3. Sort these lists in an ascending order. Create a node list according to this order. Repeat. 4. Schedule the first node in the node list to a processor that allows the earliest execution, using the insertion approach. 5. Remove the node from the node list. Until the node list is empty.

Dynamic Algorithms: These algorithms also use list scheduling approach. In Dynamic algorithms both the task prioritization phase and processor selection phase goes on side by side. Following are dynamic scheduling algorithms.

The Earliest Time First (ETF) algorithm [Kwok and Ahmad,1999]: It computes, at each step, the earliest start times for all ready nodes and then selects the one with the smallest start time, which is computed by examining the start time of the node on all processors exhaustively. The algorithm is described below. 1. Compute the static b-level of each node. 2. Initially, the pool of ready nodes include only the entry nodes. Repeat.


# Calculate the earliest start time on each processor

for each node in the ready pool. Pick the nodeprocessor pair that gives the earliest time using the non insertion approach. Ties are broken by selecting the node with a higher static b-level. Schedule the node to the corresponding processor. 4. Add the newly ready nodes to the ready node pool. Until all nodes are scheduled. Dynamic Level Scheduling (DLS) algorithm [Kwok and Ahmad, 1999]: This algorithm uses as node priority an attribute called dynamic level (DL) which is the difference between the static b-level of a node and its earliest start time on a processor. The stepwise description of the algorithm is given below. 1. Calculate the b-level of each node. 2. Initially, the ready node pool includes only the entry nodes. Repeat. 3. Calculate the earliest start time for every node on each processor. Hence, compute the DL of every node processor pair by subtracting the earliest start time from the node's static b-level. 4. Select the node processor pair that gives the largest DL. Schedule the node to the corresponding processor. 5. Add the newly ready nodes to the ready pool. Until all nodes are scheduled.

V.


# Performance comparison and results

The performance is the most important factor in every algorithm [Kaur et al, 2011] [Hagras andJaneek, 2003]. In this section, we present performance comparison of above discussed BNP scheduling algorithm. The performance comparison is based upon various comparison metrics discussed below.

Makespan: Makespan is defined as the completion time of the algorithm. It is calculated by measuring the finishing time of the exit task by the algorithm.

Speed Up: The Speed Up value is computed by dividing the sequential execution time by the parallel execution time.

Scheduled length ratio (SLR): It is defined as the ratio of the Makespan of the algorithm to Critical path values of the DAG.

Processor Utilization: It means that how processors are being utilized by different processes. It is good when maximum number processors are utilized.

Above metrices are compared for 10 nodes, 15 nodes, 20 nodes and 25 nodes in homogeneous environment and results are shown graphically.

Case 1 : In first case, results are shown for 10 nodes and 5 processors. Makespan and SLR is same for HLFET, MCP and ETF, but DLS algorithm shows highest makespan value and lowest speed up value. All processors are best utilized in case of HLFET and MCP.     


# VI. Conclusion and future scope

Makespan of MCP and ETF showed large increase in value while increasing the tasks from 20 to 25 compared to other algorithms. The average processor utilization remained same for HLFET and MCP algorithms with 10 tasks. MCP utilized processor efficiently than HLFET with 15 tasks. With 20 and 25 tasks, HLFET proved to be better than other algorithms. The SLR remained almost the same for HLFET, MCP and ETF with 10 tasks. It is maximum in case of DLS for 10 tasks. With 15 tasks MCP shows the lowest value. As the tasks are increased HLFET shows the lesser value in case of 20 and 25 tasks. Same is the case with Speed Up. With 10 tasks speedup of HLFET, MCP and DLS algorithms is same. As the tasks increase from 20 to 25, Speed Up value of HLFET hikes. It can be concluded from the above results, that HLFET is one of the efficient algorithms, considering the data gathered using the scenarios and the performance calculated from them. The thesis has vast future scope. A lot of work can be done considering more case scenarios. Heterogeneous environment can be considered, in which multiple processors having different configuration are used. The comparison of these algorithms can be done for any
1![Figure 1: Makespan values for 10 nodes](image-2.png "Figure 1 :")
2345![Figure 2 : SLR and SpeedUp for 10 nodes](image-3.png "Figure 2 :Figure 3 : 4 :Figure 5 :")
7![Figure 7 : Makespan for 20 nodes Figure 8 : SLR and SpeedUp for 20 nodes](image-4.png "Figure 7 :")
911![Figure 9 : Processor Utilization for 20 nodes Figure 10 : akespan for 25 nodes](image-5.png "Figure 9 :Figure 11 :")
12![Figure 12 : Processor Utilization for 25 nodes](image-6.png "Figure 12 :")
			© 2012 Global Journals Inc. (US) Global Journal of Computer Science and Technology Volume XII Issue VIII Version I
			© 2012 Global Journals Inc. (US)
		
		
* 
	
		Analysis, Comparison and Performance Evaluation of BNP Scheduling Algorithms in Parallel Processing
		
			ParneetKaur
		
		
			DheerendraSingh
		
		
			GurvinderSingh
		
		
			NavneetSingh
		
	
		International Journal of Information Technology and Knowledge Management
		
			4
			1
			
			January-June 2011
		
	
* 
	
		Performance Comparison of Algorithm for Static Scheduling of DAG to multiprocessor" citeseerx
		
			IshfaqAhmad
		
		
			Yu-KwongKwok
		
		
	ist.psu.edu/view doc/download?doi=10.1.1.42.8979...pdf


* 
	
		Benchmarking and comparison of the Task Graph Scheduling algorithms
		
			IshfaqAhmad
		
		
			Yu-KwongKwok
		
		
			1998
			IEEE
			
		
* 
	
		Analysis, Evaluation and Comparison Of algorithm for Scheduling Task Graph on Parallel Processor
		
			IshfaqAhmad
		
		
			Min-YouWu
		
		
			1996
			IEEE
			
		
* 
	
		Static Vs. Dynamic Listscheduling Performance Comparison
		
			THagras
		
		
			JJaneek
		
	
		Acta Polytechnica
		
			43
			6
			2003
		
	
* 
	
		Parallel Job Scheduling: Issues and Approaches
		
			GDror
		
		
			LarryFeitelson
		
		
			Rudolph
		
		
* 
	
		On parallelization of Static Scheduling Algorithm
		
			Min YouWu
		
	
		IEEE
		
			23
			
			Aug 1997
		
	
* 
	
		A performance study of multiprocessor task scheduling algorithm
		
			ShyiyuanJin
		
		
			GuySchiavone
		
		
			DamlaTurgut
		
		
			Jan -2008
			43
			
		
* 
	
		A taxonomy of Scheduling in General-Purpose Distributed Computing Systems
		
			ThomasLCasavant
		
	
		IEEE Transactions on Software Engineering
		
			14
			2
			February 1988
		
	
* 
	
		A Comparison of List scheduling for Parallel
		
			TAdam
		
		
			KChandy
		
		
			JDickson
		
		
* 
	
		Static Scheduling Algorithms for Allocating Directed Task Graphs to Multiprocessors
		
			Yu-KwongKwok
		
		
			IsfaqAhmad
		
	
		ACM Computing Surveys
		
			31
			4
			December 1999
		
	
* 
	
		Fast Scheduling and Partitioning algorithm in the multiprocessor system with Redundant Communication Resources
		
			ErykLaskowski
		
		
* 
	
		Scheduling Algorithm and Support Tools for parallel System
		
			IgorGrudenic
		
		
* 
	
		An Approach for parallel job Scheduling Using supple Algorithm
		
			SSudha
		
		
			KTjanushkodi
		
		
		Asian Journal of Information Technology
		
			7
			
			2008
		
	
* 
	
		A Comparison of General Approaches to Multiprocessor Scheduling
		
			Jing-ChiouLiou
		
		
			MichealAPalis
		
	
		IEEE
		
			
			15 April 1997