# INTRODUCTION

ata mining techniques have been applied in many application domains such as Banking, Fraud detection, Instruction detection and Communication. Recently the data mining techniques were used to improve and evaluate the engineering education tasks. Some authors have proposed some techniques and architectures for using data warehousing and data mining for higher technical education. Data mining is a process of extracting previously unknown, valid, potentional useful and hidden patterns from large data sets. As the amount of data stored in educational data bases in increasing rapidly. In order to get required benefits from such large data and to find hidden relationships between variables using different data mining techniques developed and used. Clustering and decision tree are most widely used techniques for future prediction. The aim of clustering is to partition students in to homogeneous groups according to their characteristics and abilities. These applications can help both instructor and student to improve the quality education. Analyze different factors effect a students learning behavior and performance during academic career using K-means clustering algorithm and decision tree in an higher educational institute. Decision tree analysis is a popular data mining technique that can be used to explain different variables like attendance ratio and grade ratio. Clustering is one of the basic techniques often used in analyzing data sets. This study makes use of cluster analysis to segment students in to groups according to their characteristics. Academic decisions may require extensive analysis of student achievement levels. Statistical data can also be used to see the results of important academic decisions. It is necessary to have measurements to make appropriate academic decisions on one hand; while on the other hand, there is a need to see the results of academic decisions by taking measurements.

The decision, implementation, measurement and evaluation mechanisms work like a chain one leading to the other. Their relationship is shown in Fig. 1.


# Academic Decisions

Evaluation Implementation


# Measurement


# Academic Activities

Figure 1 : Academic decision phases.

II.


# RELATED WORK a) Data Base

A data base is a collection of data usually associated with some organization or enterprise. Unlike a simple set, data in a data base are usually viewed to have a particular structure or schema with which it is associated. For example,(ID, Name, Address, Salary, Job No) may be the schema for a personal data base.


# b) Data warehousing

Data warehouse is a data base devoted to analytical processing. Data warehouse to be a set of data that supports DSS and is subject-oriented, integrated, time-variant, and non-volatile. A complete repository of historical corporate data extracted from transaction systems that is available for ad-hoc access by knowledge workers. The processes of DW involve taking data from the legacy system together with corresponding transactions of the system's data base and transforming the data in to organized information in a user friendly format. The data warehouse market supports such diverse industries as manufacturing, retail, telecommunications and health care. It has access a warehouse includes traditional querying, OLAP, and data mining. Since the warehouse is stored as a data base, it can be accessed by traditional query languages. Example of data warehousing can be defined in any of your organization. Consider the case of a Bank; a bank will typically have current accounts and saving accounts, foreign currency account etc. The bank will have an MIS system for leasing and another system for managing credit cards and another system for every different kind of business they are in . However, nowhere they have the total view of the environment from the customer's perspective. The reason being, transaction processing systems are typically designed around functional areas, within a business environment. For good decision making you should be able to integrate the data across the organization so as to cross the LoB (Line of Business) . So the idea here is to give the total view of the organization especially from a customer's perspective within the data warehouse, as shown in below figure 3.  


# CLUSTERING

Clustering is a method to group data in to classes with identical characteristics in which the similarity of intra-class is maximized or minimized. Cluster analysis used to segment a large set of data in to subsets called clusters. Each cluster is a collection of data objects that are similar to one another are placed within the same cluster but are dissimilar to objects in other clusters. A cluster of data objects can be treated collectively as one group in many applications. Cluster analysis is an important human activity. Cluster analysis has been widely used in numerous applications, including pattern recognition, data analysis, image processing, and market research. Clustering is a descriptive task that seeks to identify homogeneous groups of objects based on the values of their attributes. Current clustering techniques can be broadly classified in to three categories; partitional, hierarchical and locality-based algorithms.

Definition: Given a data base D = { t 1 , t 2 ,--------t n } of tuples and an integer value K, the clustering problem is to define a mapping f:D {1,2,3--------K} where each t i is assigned to one cluster K j , i ? j?k. A cluster K j , contains precisely those tuples mapped to it; that is K j = {t i / f(t i )= k j ,1?i?n and t i ?D}.


# a) K-Means Clustering

K-Means is one of the simplest unsupervised learning algorithms used for clustering. K-means partitions "n" observations in to k clusters in which each observation belongs to the cluster with the nearest mean. This algorithm aims at minimizing an objective function, in this case a squared error function. 


# Global Journal of Computer


# DECISION TREE

Decision tree induction can be integrated with data warehousing techniques for data mining. A decision tree is a predictive node ling technique used in classification, clustering, and prediction tasks. Decision tree use a "divide and conquer" technique to split the problem search space in to subsets.

A decision tree is a tree where the root and each internal node are labeled with a question. The arcs emanating from each node represent each possible answer to the associated question. Each leaf node represents a prediction of a solution to the problem under consideration.

Given a data base D = {t 1 ,t 2 ,--------t n } where t i = (t i1 -----------t in ) and the data base schema contains the following attributes {A1,A2,-------An}. Also given is a set of classes C = {C1, C2, --------Cm}. A decision tree or classification tree is a tree associated with D that has the following properties:

1. Each internal node is labeled with an attribute, Ai. 2. Each arc is labeled with a predicate that can be applied to the attribute associated with the parent. 3. Each leaf node is labeled with a class, Cj.

The basic algorithm for decision tree induction is a greedy algorithm that constructs decision trees in a top-down recursive divide-and-conquer manner.

Each internal node tests an attribute, each branch corresponds to attribute value, and each leaf node assigns a classification.    After the pattern is classified from the decision tree we can obtain the specify knowledge discovery to form the knowledge base system. Similarly the same data mining process can be done to the professors for classifying their performance which help in improve Technical education system.
2![Figure 2 : Data ware house](image-2.png "Figure 2 :")
3![Figure 3 : A data warehouse crosses the Line of Business.](image-3.png "Figure 3 :")
2![Figure 2 : The transition from raw data to valuable knowledge.](image-4.png "Figure 2 :")


2January 201234Student Roll numberMarksEffortDEC-0110-50More Attention, Conducting Special Classes, Assigments,Conducting more practical classes,Daily tests and Parents and Faculty meeting.DEC-0251-60Conducting Special Classes,Assignments andConducting more practical classes.DEC-0361-75Assignments andConducting more practical classes.DEC-0476-85Assignments andConducting classes for Interviews.DEC-0586-100Conducting classes for Interviews andGiving exposure for career important.Formation of Decision tree from students marks table and applying effort depending on marks.© 2012 Global Journals Inc. (US)
1
1S NOStudentCourseProfessorMarks Grade Results1Ferede AdugnaCryptographyDr. Rao86AYES2Rwibasira MInterfacing TechMiss Maria75BNO3Makuei NyokOperating SystemsGenetu Yohannes85AYES4Daniel TekifMicroprocessorMichael55DNO5Mesfin DadiDistributed SystemsLea95AYES6Debebe ShibeshiComputer NetworksGenetu Yohannes90AYES7Gidey AbrhaNetwork securityDr. Srinivas98AYES8Samuel HagosWeb technologyOliver45FNO9Desta DesisaCompiler DesignMelissa73BNO10Tibabu BezaCloud computingPraveen50DNO
			© 2012 Global Journals Inc. (US) Global Journal of Computer Science and Technology Volume XII Issue II Version I
			January 2012
		
		
VI.


## CONCLUSION

In this study of research paper idea is a starting attempt to use data warehousing and data mining techniques to analyze and find out student academic performance and to improve the quality of the engineering system. The managements can use some techniques to improve the course outcomes according to the improve knowledge. Such knowledge can be used to give a good understanding of student's enrollment pattern in the course under study, the faculty and managerial decision maker in order to utilize the necessary steps needed to provide extra classes. Other hand, such type of knowledge the management system can be enhance their policies , improve their strategies and improve the quality of the system. 


## Performance of Students by


## RESULTS

Both K-Means clustering, Decision tree algorithm were applied on the data set. 


## Number of Node
			
			
* 
	
		Return N as a leaf node labeled with the most common class in samples
		
	
* 
	
		Select test-attribute, the attribute among attribute-list with the highest information gain
		
	
* 
	
		Label node N with test-attribute
		
	
* 
	
		Grow a branch from node N for the condition test attribute = a i
		
	
* 
	
		Let Si be the set of samples for which test-attribute = a i
		
	
* 
	
		If Si is empty then 12. Attach a leaf labeled with the most common class in samples
		
	
* 
	
		Else attach the node returned by generate-decision-tree(S i ,attribute-list-attribute)
		
	
* 
	
		A Research on result oriented learning process from university students based on distributed data mining and decision tree algorithm
		
			References
		
		
			Vuda Sreenivasarao
		
		
			.SDr
		
		
			GVidyavathi
		
		
			SkRamaswamy
		
		
			Shabber
		
	
		Journal of Advance research in Computer Engineering
		
			4
			2
			
			2010
		
	
* 
	
		The result oriented process for process for students based on distributed data mining
		
			PVSubbareddy
		
		
			VudaSreenivasarao
		
	
		International journal of advanced computer science and applications
		
			1
			5
			
			Nov-2010
		
	
* 
	
		
			V