# I. INTRODUCTION muscles motion and facial feature deformation into abstract classes that are purely based on visual information while human emotions result from other different factors and their state might or might not be revealed through a number of communication channels such as emotional voice, pose, gestures, gaze direction and facial expressions [1]. Consider a scenario where a person tries downloading one of his favorite movies but becomes frustrated with the system's inability to load the program due to bandwidth or some other reason. In this case the person is likely to express some sort of emotional dissatisfaction well captured via face expressions either intentionally or unintentionally. There are existing works like Chen et al. [2] and De Silva et al. [3] that studied combined detection of facial and vocal expressions of emotion. However, majority of the studies treat the various human communication channels separately. An automatic recognition of face expression through facial Action Unit (AU) has attracted much attention in the recent years due to its potential applications in behavioral science, medicine, security and human machine interaction. The Facial Action Coding System (FACS) developed by Ekman and Friesen [4] is the most commonly used system for facial behavior analysis. Based on image data, face expressions can be categorized based on whether they are static images or dynamic image sequence. Static images analyze single, still image, based on the spatial information of a frame and the face geometry. It has less computation and is more suitable for real-time facial expression recognition. Dynamic image sequences on the other hand take the motion information of expression images, the expression changes in time and space information together into account, so the recognition rate is very high, but the amount of calculation is also so large. Currently, international standardized face expression classification includes seven classes neutral, anger, happiness, sadness, surprise, disgust and fear [5,6]. Many researchers have proposed various methods to detect and recognize face expressions. In general face expression representation could be categorized either as holistic, analytic or hybrid methods [1] [7]. In holistic approach, the whole face region is taken as data input into the face expression recognition system. Examples of holistic methods are eigenfaces, probabilistic eigenfaces, fisherfaces, support vector machines, nearest feature lines (NFL) and independentcomponent analysis approaches. In analytic-based approaches, local feature points on face such as the nose, the mouth, the eyes are segmented and then used as input data to the classifier while hybrid separately extracts both local and global features and combines them for recognition. However, this field still remains very challenging especially in real life applications. Recently, Feng in 2004 [8] used Local binary parameters to extract face appearance features. They used a two stage classifier, at the first stage, two expression candidates from initial seven were selected. he human face is undoubtedly the most common characteristic used by humans to recognize and reflect face expressions with speed and accuracy in a passive and non intrusive manner. Facial expressions deals with the classification of facial T At the second stage, one of the two candidate classes was verified as final expression class. In 2006, Tsai and Jan [9] used subspace model analysis to analyze the data and to recognize facial expressions. They also did some research on facial deformation problems e.g. pose or illumination variations. Nan and Youwei [10] used five classifiers and then used Dempster-Shafer (DS) classifier combination approach. They reportedly achieved a maximum accuracy of 95.7%. Wallhoff et al. in [11] discussed innovative holistic and self organizing approaches for efficient facial expression analysis. Their experiments were based on publicly available FEEDTUM database. They achieved accuracy of 61.67% by using macro motion blocks and SVM-SFFS as feature extraction and feature classification respectively. In 2008, Kotsia et al. [12] did an analysis of the effect of partial occlusion on facial expression recognition, using Gabor wavelets, Discriminant Non-negative Matrix Factorization and a shape-based method as feature extraction techniques. Whitehill et al. [13] explored an idea for recognition of facial expression in relation with intelligent tutoring system. Their idea was to automatically estimate the difficulty level of the lecture as perceived by the student as well as to determine the preferred viewing speed of the student. In 2009, Tai and Huang [14] proposed a method for facial expression recognition in video sequences. They performed noise reduction using median filter and then used a crosscorrelation of optical flow and mathematical models from the facial points. Finally the features were fed to an ELMAN neural network for expression classification. Even though many researchers have used various methods to recognize facial expressions from images and videos, utilizing multi-resolution technique to process image pixel value for recognizing the facial expressions has not been exhausted. One of the most popular multi-resolution analysis technique is the wavelet transform. Wavelet transform can be performed for every scale and translation, resulting in Continuous Wavelet Transform. Wavelet transform can also be performed at multiples of scale and translation intervals, resulting in Discrete Wavelet Transform (DWT). Since CWT provides redundant information and involves more computational effort, normally DWT is preferred [15]. In [16], Haar-like technique has been used to extract the features. Six statistical features namely variance, standard deviation, mean, power, energy and entropy were derived from the approximation coefficients of Haar-like decomposition. These statistical features were used as an input to the neural network for classifying 8 facial expressions. Preprocessing is regarded as an important step in image processing as it helps improve the image quality by removing noise, highlighting features of interest and separating the object of interest from the background. In this work, we use morphological opening and closing operators to eliminate noise and its effects from the input image before using Haar discrete wavelet transform to extract features for the neural network classifier. # a) Proposed Method In this paper, a face represents a set of connected regions of similar texture and intensity levels that are combined to form objects. Some of these objects are small and low in contrast while others are large and high in contrast, presenting the need to analyze them using multi-resolution processing. In order to address the curse of dimension, we map the face image model into a low level dimension system that reflects the dynamics of the human facial expression system, then use binary image processing techniques to reject noise introduced by cropping facial images through low resolution disparity calculations that consequently lead to gain of better results. Next the discrete wavelet transform is used to extract features for a neural network classifier. The rest of the paper is organized as follows: Section 2 gives the image preprocessing, Section 3 gives detailed description on discrete wavelet transform. Section 4 gives the details of a neural network 5 presents the experiment results and analysis. Finally, we conclude in Section 6. # II. IMAGE PRE-PROCESSING The first step is to acquire images from the sensor or from the database. In our experiment we use static images from JAFEE database. Image preprocessing is a significant step it transforms the image data until regions of interest, better suited for analysis are found. First we crop the facial part of the image in order to remove hair, the neck and other background details that are not central to face expression followed by histogram equalization to enhance the quality of the image. Next we use morphological operator opening and closing to eliminate noise and its effects which may have arisen from image acquisition while distorting the image as little as possible. Morphological processing compares pixels to those pixels surrounding it. They change the shape of particles by processing each pixel based on its number of neighbors and the values of those neighbors. A neighbor is a pixel whose value affects the values of nearby pixels during certain image processing functions. Morphological transformations use a 2D binary mask to define the size and effect of the neighborhood on each pixel, controlling the effect of the binary morphological functions on the shape and the boundary of a particle. Morphological opening defined as the opening of an image X by structuring element B is the erosion of X by B followed by the dilation of the result by B. # () X B X B B ?? ?? , ( )( 2 ) () , ( ) ( 2 )j d k x n h n k jj DWT j xn a k x n g n k jj ? ? ? ? ? ? ?? ? ? ? ?? ?(3) The coefficients d j,k refer to the detail components in signal x(n) and correspond to the wavelet function, whereas a j,k refer to the approximation components in the signal. The functions h(n) and g(n) in the equation represent the coefficients of the high-pass and low-pass filters respectively, whilst parameters j and k refer to wavelet scale and translation factors. By applying 2-D DWT to an image we decompose it into four equal sub-bands each a fourth of the original image as shown in figure 2(a). The LL, LH, HL and HH sub-bands corresponding to approximate, horizontal, vertical and diagonal matrices respectively. The LL corresponds to low frequency information from both the horizontal and vertical directions; LH corresponds to the low frequency components in the horizontal direction and high frequency in the vertical direction. HL corresponds to the high frequency components in the horizontal direction and low frequency in the vertical direction. And HH corresponds to the high frequency components in both the horizontal and the vertical direction. Taking the LL sub-band to represent the image we compress the original image into quarter its dimension, which leads to less computation complexity and recognition time. Further wavelet decomposition on the LL image generates lower dimensional multi resolution facial image. If the decomposed components continue to be decomposed, it forms a pyramid-like wavelet decomposition tree structure that might be beneficial for further analysis. It is worth noting that the LL sub-band of an image carries the general and most important features of the face which is necessary for recognition while the high frequency carries most detailed information that happen when a person is smiling, sleeping, annoyed It and so on which are key to our face expression recognition. From figure 2(b) it is clear that the eyebrows, eyes and mouth contour in the HL are very clear. To be able to verify the impact of high frequency components on the expression recognition, we extract the low-frequency components LL and add high- Step 1: Two-level wavelet decomposition of training set with haar wavelet packet. Organize the lowfrequency LL into column vector, denoted by X; Step 2 : Change high-frequency components HL, LH and HH into a column vector, then respectively add them to column vector X to form feature vector X"; Step 3 : Feature vectors obtained from Step2 form the input to our neural network for classification. a(n) h(n) h(n-1) h(n-2) v(n) d(n) v(n-1) d(n-1) v(n-2) d(n-2) Figure (2a) 3 rd level DWT decomposition structure, n denotes the decomposition scale and a, h, v, d are approximation, horizontal, vertical and diagonal detail coefficients, respectively. The sub image in the upper left corner is the approximation image that results from final decomposition step surrounded in a clockwise manner by the horizontal, diagonal and vertical detail coefficients that were generated during the same decomposition. IV. # BACK PROPAGATION NEURAL NETWORK (BPNN) Back propagation method enables the network to learn a predefined set of input-output example pairs by using a two-phase propagate-adapt cycle. First an input pattern is applied as a stimulus to the first layer of network units, which is propagated through each hidden layer until an output is generated. The actual network outputs are subtracted from the desired outputs and an error signal is produced. This error signal is the basis for the back propagation step, whereby the errors are passed back through the network by computing the contribution of each hidden processing layer and deriving the corresponding adjustment needed to produce the correct output. This process repeats, layer by layer, until each node in the network receives an error signal that describes its relative contribution to the total error. Based on the error signal received, connection weights are then updated by each unit to cause the network to converge toward a state that allows all the training patterns to be encoded. The back propagation network used consists of input layer, hidden and output layer. Before the training process the weights are initialized to small random numbers to ensure that the network is not saturated by large values of the weights and to prevent other training pathologies in addition to making sure that the network learns. The number of neurons in the hidden layer were varied between 6 and 15.In our analysis we found the system to perform well with 10 neurons. The number of neurons in the output layer equals to 6, the number of classes. The stopping criteria used the sum of squared error (SSE) (1.0) and a maximum number of epochs equals to (10000). The basic procedure for training back propagation network is embodied in the description given below [17]. # Algorithm : i. Initialize the network weights and biases ii. Select the training pair from the training set apply input vector to the network input iii. Sum the weighted inputs and apply activation function to compute output signal. # () N h h h h h pj ji pi j pj j pj i X w x b i f X ? ? ? ? ? ?(4) Global Journal of Computer Science and Technology Volume XII Issue VII Version I Calculate the output of the network 0 o o pk k pk 1 o = f (y ) L o o pk kj pj k j y w i b ? ? ? ? ?(5) where superscript o `? refers to quantities at the output layer. v. Calculate the error terms for the output units: o0 pk pk k pk = (y -o )f (y ) o pk ? ?(6) y pk is the desired output value, and o pk is the actual output Followed by the error terms for the hidden units: h j f ( ) h h o o pk pj pk kj Xw ?? ? ? ?(7) Update weights on the output layer: ( 1) hn ji ji pj i w t w t x ?? ?? ? ? ? (8) Update the weights on the hidden layer: ( 1) hn ji ji pj i w t w t x ?? ?? ? ? ? (9) Repeat the above steps with all the training vectors until the error for all vectors in the training set is reduced to an acceptable value. Figure 4 shows examples of the original images in the JAFFE database. In the real-world environments, the rotation of the camera axis and the head pose variations often exist. JAFFE database presented includes these images with minor rotation of camera axis and variations in head poses. As a result, the robustness of the proposed method is evaluated. The images depicting six different facial expressions: fear, disgusting, happiness, sadness, anger and surprise were used. The feature vectors are extracted from second level Haar discrete wavelet decomposition of the corresponding image. Beyond this level we noted that the size of images become unreasonably small and no valuable information could be extracted. The extracted image data are divided into testing and training data. In training phase two feature vectors per class were used and in the testing phase the remaining feature vector from each class was used. The images used in the testing set were not included in the training set. We use six binary NN classifiers, the data is divided into six blocks according to six expression classes, and each classifier is trained for a particular expression using one-against-all approach. The output of these binary classifiers gives the probabilities to which extent the input image belongs or does not belong to the class for which the particular classifier has been trained. After the training, we use the output generated as a feature map that provides an indication of the presence or absence of many face expression feature combinations at the input. We repeated the training procedure 30 times and got average results for the trials, with a percentage accuracy of 81%. # VI. CONCLUSION In this work the high accuracy recognition system based on machine learning with a reasonable number of samples are introduced. First the input image is preprocessed, morphological operators are used to remove noise and to smoothen it the resulting binary data is used as input to discrete wavelet transform. Second level Haar wavelet is computed and the resulting feature vectors are used as input to the back propagation neural network classifier. Experiments for evaluation were carried out on JAFEE database presenting the six facial expressions, 'angry', 'disgusting', 'fear', 'happy', 'sad', 'surprise' and the results have shown that the proposed method can perform at 81% accuracy. The low results arise from unclassified data of fear class ( 43%)and sad class (73%). In these analysis the simplicity and robustness of the system is significant. For our future work, we plan to look into the facial expression recognition of subjects in real-time videos. VII. 1![Global Journal of Computer Science and Technology Volume XII Issue VII Version I 2012 A Neural Network Based Classifier for a Segmented Facial Expression Recognition System Based on Haar Wavelet Transform April Similarly closing of an image X by structuring element B is dilation followed by erosion. the noise from the background but increases the size of noise elements (dark spots) contained in the image because they are inner boundaries that increase in size as objects are eroded. The enlargement is countered by performing dilation on the resulting face image. Morphological opening creates some gaps within the image this is fixed by performing a closing operation on the opening. The results are given in figure1. (d) Closing has the overall effect of smoothening the image and eliminating small holes. Figure1 (a) shows the original face, Figure1 (b) original cropped image. Figure1 (c) gives the result of opening image in (b) with a structuring element. Figure1 (d) is the result of performing a closing on results of figure1(c), it gives the net result a smoothened image after noise elimination both in the background and in the face image.](image-2.png "( 1 )") 1![Figure 1. (a) Original test image (b) A cropped test image (c) morphological opened test image (d) morphological closing](image-3.png "Figure 1 .") ![Neural Network Based Classifier for a Segmented Facial Expression Recognition System Based on Haar Wavelet Transform where h ji w is the weight connection from the i th input unit and h j b is the bias. 'h' superscript refers to quantities of the hidden layer.](image-4.png "2012A") 3![Figure 3. Illustration of the neural network training V. EXPERIMENT RESULTS AND ANALYSIS To assess the validity and efficiency of our approach, the experiments are conducted on the Japanese Female Facial Expression (JAFFE) database, which contains 213 images of 7 facial expressions posed by 10 Japanese female models. Ten expressers posed 3 or 4 examples of each of six facial basic expressions (happiness, sadness, surprise, anger, disgust, and fear) and a neutral expression. Images](image-5.png "Figure 3 .") 4![Figure 4: A sample of angry faces from JAFEE databaseThe results are illustrated using the confusions matrix table 1. The row corresponds to the six face expressions while the columns correspond to the](image-6.png "Figure 4 :") 1angrydisgusting fearhappysadsurprise unclassifiedangry100disgusting100fear5743happy100sad2773surprise100 © 2012 Global Journals Inc. (US) © 2012 Global Journals Inc. (US) © 2012 Global Journals Inc. (US) Global Journal of Computer Science and Technology Volume XII Issue VII Version I 11 ## ACKNOWLEDGMENTS We thank JAFFE Database for providing us the face images for the experiments. This work was partially supported by the National Natural Science Foundation of China (50275150) and the National Research Foundation for the Doctoral Program of Higher Education of China (20040533035, 20070533131). * Automatic facial expression analysis": a survey BFasel JLuettin Pattern Recognition 36 1 2003 * Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge LChen HTao THuang TMiyasato RNakatsu Proc. IEEE Workshop on Multimedia Signal Processing IEEE Workshop on Multimedia Signal essing 1998 * Bimodal emotion recognition LCSilva PCNg Proc.4th IEEE Int. Conf. on Automatic Face and Gesture Recognition .4th IEEE Int. Conf. on Automatic Face and Gesture RecognitionFrance Mar. 2000 * Facial action coding system: a technique for the measurement of facial movement PEkman WFriesen 1978 Consulting Psychologists Press Palo Alto * The Method of Facial Expression Recognition Based on DWT-PCA/LDA SDongcheng Jieqing 2010 3rd International Congress on Image and Signal Processing * Survey of Facial Expression Recognition Based on Computer Vision Wang Zhiliang WangLiu Fang Li Computer Engineering 32 11 2006 * Facial feature extraction for face recognition: a review ElhamBagherian RahmitaWirza OKRahmat Information Technology 2 2008. 2008. 2008 * Facial expression recognition based on local binary patterns and coarse-to-fine classification XiaoyiFeng Proceedings of the The Fourth International Conference on Computer and Information Technology (CIT'04) the The Fourth International Conference on Computer and Information Technology (CIT'04) 2004 * Expression-invariant face recognition system using subspace model analysis PHTsai Jan IEEE International Conference on Systems, Man and Cybernetics 2005 * Inducement analysis in facial expression recognition ZhangNan ZhangYouwei The 8th International Conference on Signal Processing 2006. 2006 III * Efficient recognition of authentic dynamic facial expressions on FEEDTUM database FWallhoff BSchuller MHawellek GRigoll IEEE Conference on Multimedia and Expo (ICME'06) Pages 2006 * An analysis of facial expression recognition under partial facial image occlusion IreneKotsia IoanBuciu IoannisPitas transanctions on Image and Vision Computing April 12. 2008 26 * Automatic facial expression recognition for intelligent tutoring systems JWhitehill MBartlett JMovellan proceedings of IEEE computer scociety workshop on Computer Vision and Pattern Recognition IEEE computer scociety workshop on Computer Vision and Pattern Recognition 2008 * Facial expression recognition in video sequences ShenchuanTai HungfuHuang Proceedings of the 6th International Symposium on Neural Networks: Advances in Neural Networks -Part III the 6th International Symposium on Neural Networks: Advances in Neural Networks -Part III 2009 * FCM clustering of human emotions using wavelet based features from EEG MMurugappan MRizon RNagarajan Yaacob S Trans. Biomed. Soft Comput. Hum. Sci. IJBSCHS 14 2 2009 * Recognition of facial expression using Haar-like feature extraction method MSatiyan RNagarajan Proceedings of the 3 rd IEEE International Conference on Intelligent and Advanced Systems (ICIAS) the 3 rd IEEE International Conference on Intelligent and Advanced Systems (ICIAS)Kuala Lumpur, Malaysia Year 2010 * AJames Freeman MDavid Sukapra Neural Networks Algorithms, Applications and Programming Techniques Addison-Wesley Publishing Company 1991