# I. Introduction

ign language continues to be the best method to communicate between the deaf and hearing impaired. Hand gestures enable communication between deaf people during their daily lives rather than speaking. In our society, Arabic Sign Language (ArSL) is only known for deaf people and specialists, thus the community of deaf people is narrow. To help people with normal hearing communicate effectively with the deaf and the hearing-impaired, numerous systems have been developed for translating diverse sign languages from around the world. Several review papers have been published that discuss such systems and they can be found in [1]- [7].

Generally, the process of ArSL recognition (ArSLR) can be achieved through two main phases: detection and classification. In stage one, each given image is pre-processed, improved, and then the regions of interest (ROI) is segmented using a segmentation algorithm. The output of the segmentation process can thus be used to perform the sign recognition process. Indeed, accuracy and speed of detection play an important role in obtaining accurate and fast recognition process. In the recognition stage, a set of features (patterns) for each segmented hand sign is first extracted and then used to recognize the sign. These features can be used as a reference to understand the differences among the classes.

Recognizing and documenting of ArSL have only been paid attention recently, where few attempts have investigated and addressed this problem, see for example [8]- [11]. The question of ArSL recognition is therefore a major requirement for the future of ArSL. It facilitates the communication between the deaf and normal people by recognizing the alphabet and numbers signs of Arabic sign language to text or speech. To achieve that goal, this paper proposes a new Arabic sign recognition system based on new machine learning methods and a direct use of tiny images.

The rest of the paper is organized as follows. Section2 presents the current approaches to Arabic alphabet sign language recognition (ArASLR). Section 3 describes the proposed model for ArASLR. Conclusions and future works are presented in section 4.


# II. Current Approaches

Studies in Arabic sign language recognition, although not as advanced as those devoted to other scripts (e.g. Latin), have recently shown interest [8]- [11]. We have also seen that current research in ArSLR has only been satisfactory for alphabet recognition with accuracy exceeding 98%. Isolate Arabic word recognition has only been successful with medium-size vocabularies (less than 300 signs). On the other hand, continuous ArSLR is still in its early stages, with very restrictive conditions.

Current approaches on sign language recognition usually falls into two major approaches. The first one is sensors based approaches, which employs sensors attached to the glove. Look-up table software is usually provided with the glove to be used for hand gesture recognition. Recent sensors based approaches can be found, for instance, in [11]- [14]. The second approaches, vision-based analysis, are based on the use of video cameras to capture the movement of the hand that is sometimes aided by making the signer wear a glove that has painted areas indicating the positions of the fingers and the wrist then use those measurements in the recognition process. Image-based techniques exhibit a number of challenges. These include: lighting conditions, image background, face and hands segmentation, and different types of noise.

Among of image-based approaches, some authors [15] introduced a method for automatic recognition of Arabic sign language alphabet. For feature extraction, Hus moments were used followed by support vector machines (SVMs) to perform the classification process. A correct recognition rate of 87% was achieved. Other authors in [16] developed a neurofuzzy system. The proposed system includes five main steps: image acquisition, filtering, segmentation, and hand outline detection, followed by feature extraction. Bare hands were considered in the experiments, achieving a recognition accuracy of 93.6%. In [17], the authors proposed an adaptive neuro-fuzzy inference system for alphabet sign recognition. A colored glove was used to simplify the segmentation process, and geometric features were extracted from the hand region. The recognition rate was improved to 95.5%. In [18], the authors developed an image-based ArSL system that does not use visual markings. The images of bare hands are processed to extract a set of features that are translation, rotation, and scaling invariant. A recognition accuracy of 97.5% was achieved on a database of 30 Arabic alphabet signs. In [19], the authors used recurrent neural networks for alphabet recognition. A database of 900 samples, covering 30 gestures performed by two signers, was used in their experiments. The Elman network achieved an accuracy rate of 89.7%, while a fully recurrent network improved the accuracy to 95.1%. The authors extended their work by considering the effect of different artificial neural network structures on the recognition accuracy. In particular, they extracted 30 features from colored gloves and achieved an overall recognition rate of 95% [20].

A recent paper reviews the different systems and methods for the automatic recognition of Arabic sign language can be found in [7]. It highlights the main challenges characterizing Arabic sign language as well as potential future research directions. Recent works on image-based recognition of Arabic sign language alphabet can be found in [9], [10], [21]- [25]. In particular, Naoum et al. [9] proposes an ArSLR using KNN. To achieve good recognition performance, they proposed to combine this algorithm with a glove based analysis technique. The system starts by finding histograms of the images. Profiles extracted from such histograms are then used as input to a KNN classifier. Mohandes [10] proposes a more sophisticated recognition algorithm to achieve high performance of ArSLR. The first attempt to recognize two-handed signs from the Unified Arabic Sign Language Dictionary using the CyberGlove and SVMs to perform the recognition process. PCA is used for feature extraction. The authors in [21] proposed an Arabic sign language alphabet recognition system that converts signs into voice. The technique is much closer to a real-life setup; however, recognition is not performed in real time. The system focuses on static and simple moving gestures. The inputs are color images of the gestures. To extract the skin blobs, the YCbCr space is used. The Prewitt edge detector is used to extract the hand shape. To convert the image area into feature vectors, principal component analysis (PCA) is used with a K-Nearest Neighbor Algorithm (KNN) in the classification stage. Furthermore, the authors in [22] and [23] proposed a pulse-coupled neural network (PCNN) ArSLR system able to compensate for lighting nonhomogeneity and background brightness. The proposed system showed invariance under geometrical transforms, bright background, and lighting conditions, achieving a recognition accuracy of 90%. Moreover, the authors in [24] introduced an Arabic Alphabet and Numbers Sign Language Recognition (ArANSLR). The phases of the proposed algorithm consists of skin detection, background exclusion, face and hands extraction, feature extraction, and also classification using Hidden Markov Model (HMM). The proposed algorithm divides the rectangle surrounding by the hand shape into zones. The best number of zones is 16 zones. The observation of HMM is created by sorting zone numbers in ascending order depending on the number of white pixels in each zone. Experimental results showed that the proposed algorithm achieves 100% recognition rate.

On the other hand, new systems for facilitating human machine interaction have been introduced recently. In particular, the Microsoft Kinect and the leap motion controller (LMC) have attracted special attention. The Kinect system uses an infrared emitter and depth sensors, in addition to a high resolution video camera. The LMC uses two infrared cameras and three LEDs to capture information within its interaction range. However, the LMC does not provide images of detected objects. The LMC has recently been used for Arabic alphabet sign recognition with promising results [25].

After presenting the different existing imagebased approaches that have been used to achieve ArASLR, we have noted that these approaches generally include two main phases of coding and classification. We have also seen that most of the coding methods are based on hand-crafted feature extractors, which are empirical detectors. By contrast, a set of recent methods based on deep architectures of neural networks give the ability to build it from theoretical considerations.

ArSLR therefore requires projecting images onto an appropriate feature space that allows an accurate and rapid classification. Contrarily to these empirical methods mentioned above, new machine learning methods have recently emerged which strongly related to the way natural systems code images [26]. These methods are based on the consideration that natural image statistics are not Gaussian as it would be if they have had a completely random structure [27]. The autosimilar structure of natural images allowed the evolution to build optimal codes. These codes are made of statistically independent features and many different methods have been proposed to construct them from image datasets. Imposing locality and sparsity constraints in these features is very important. This is probably due to the fact that any simple algorithms based on such constraints can achieve linear signatures similar to the notion of receptive field in natural systems. Recent years have seen an interesting interest in computer vision algorithms that rely on local sparse image representations, especially for the problems of image classification and object recognition [28]- [32]. Moreover, from a generative point of view, the effectiveness of local sparse coding, for instance for image reconstruction [33], is justified by the fact that an natural image can be reconstructed by a smallest possible number of features. It has been shown that Independent Component Analysis (ICA) produces localized features. Besides it is efficient for distributions with high kurtosis well representative of natural image statistics dominated by rare events like contours; however the method is linear and not recursive. These two limitations are released by DBNs [34] that introduce nonlinearities in the coding scheme and exhibit multiple layers. Each layer is made of a RBM, a simplified version of a Boltzmann machine proposed by Smolensky [35] and Hinton [36]. Each RBM is able to build a generative statistical model of its inputs using a relatively fast learning algorithm, Contrastive Divergence (CD), first introduced by Hinton [36]. Another important characteristic of the codes used in natural systems, the sparsity of the representation [26], is also achieved in DBNs. Moreover, it has been shown that these approaches remain robustness to extract local sparse efficient features from tiny images [37]. This model has been successfully used in [32] to achieve semantic place recognition. The hope is to demonstrate that DBNs coupled with tiny images can also be successfully used in the context of ArASLR.


# III. Proposed Model

The methodology of this research mainly includes four stages (see figure 1) which can be summarized as follows: 1) data collection and image acquisition, 2) image pre-processing, 3) feature extraction and finally 4) gesture recognition.


# a) Description of the Database

The alphabet used for Arabic sign language is displayed in Figure 2, left [38], will be used to investigate the performance of the proposed model. In this database, the signer performs each letter separately. Mostly, letters are represented by a static posture, and the vocabulary size is limited. In this section, several methods for image-based Arabic sign language alphabet recognition are discussed. Even though the Arabic alphabet only consists of 28 letters, Arabic sign language uses 39 signs. The 11 additional signs represent basic signs combining two letters. For example, the two letters ?"??"? are quite common in Arabic (similar to the article "the" in English). Therefore, most literature on ArASLR uses these basic 39 signs.


# b) Image Pre-processing

The typical input dimension for a DBN is approximately 1000 units (e.g. 30x30 pixels). Dealing with smaller patches could make the model unable to extract interesting features. Using larger patches can be extremely time-consuming during feature learning. Additionally the multiplication of the connexion weights acts negatively on the convergence of the CD algorithm. The question is therefore how could we scale the size of realistic images (e.g. 300x300 pixels) to make them appropriate for DBNs? alphabet. One can see that, despite the size reduction, these small images remain fully recognizable Three solutions can be envisioned. The first one is to select random patches from each image as done in [39], the second is the use of convolutional architectures, as proposed in [40], and the last one is to reduce the size of each image to a tiny image as proposed in [37]. The first solution extracts local features and the characterization of an image using these features can only be made using BoWs approaches we wanted to avoid. The second solution shows the same limitations as the first one and additionally gives raise to extensive computations that are only tractable on Graphics Processing Unit architectures. Features extraction using random patches is irrespective of the spatial structures of each image [41]. In the case of structured scenes like the ones used in semantic place recognition these structures bear an interesting information.

Besides, tiny images have been successfully used in [37] for classifying and retrieving images from the 80-million images database developed at MIT. Torralba in [37] showed that the use of tiny images combined with a DBN approach led to code each image by a small binary vector defining the elements of a feature alphabet that can be used to optimally define the considered image. The binary vector acts as a bar-code while the alphabet of features is computed only once from a representative set of images. The power of this approach is well illustrated by the fact that a relatively small binary vector largely exceeds the number of images that have to be coded even in a huge database (2256 ? 1075). So, for all the se reasons we have chosen image reduction.

On the other hand, natural images are highly structured and contain significant statistical redundancies, e.g. their pixels have strong correlations [42], [43]. Removing these correlations is known as whitening. It has been shown that whitening is a mandatory step for the use of clustering methods in object recognition [44]. Whitening being a linear process and it does not remove the higher order statistics present in the data.

As a consequence, as proposed by [37] and [32], after color conversion and image cropping, the image size is reduced to 42x24 as shown in figure 1. The final set of tiny images is centered and whitened in order to eliminate order 2 statistics. Consequently the variance in equation 6 will be set to 1. Contrarily to [37], the 42x24 = 1008 pixels of the whitened images will be used directly as the input vector of the network for features extraction purpose.


# c) Features Extraction

Next the feature extraction stage comes. This stage is the most significant stage which is based on using a new unsupervised machine learning model DBNs. DBNs are probabilistic generative models composed of multiple RBMs layers of latent stochastic variables. The latent variables typically have binary values. They correspond to hidden units or feature detectors. The input variables are zero-mean Gaussian activation units and are often used to reconstruct the visible units. As shown in figure 3, the top two layers have undirected, symmetric connections between them and they form the weights or the features. These features are extracted using the principle of energy function minimization according to the quality of the image reconstruction. It has been shown that features extracted by DBNs are more promising for image classification than hand-engineered features [32], [45], [46]. So, we hope that, due to the statistical independence of the features and their sparse nature, learning in the feature space will become linearly independent, greatly simplifying the way we will learn to classify the signs. 

Probabilities of the state for a unit in one layer conditional to the state of the other layer can therefore be easily computed. According to Gibbs distribution: (2) where is a normalizing constant. Thus after marginalization:

(3) it can be derived [47] that the conditional probabilities of a standard RBM are given as follows:

(4) (5) where is the logistic function.

Since binary units are not appropriate for multivalued inputs like pixel levels, as suggested by Hinton [48], in the present work visible units have a zero-means Gaussian activation scheme: (6) In this case, the energy function of Gaussian-Bernoulli RBM is given by: (7) One way to learn RBM parameters is through the maximization of the model log likelihood in a gradient ascent procedure. The partial derivative of the log-likelihood for an energy-based model can be expressed as follows: (8) where ? ? ?????????? is an average with respect to the model distribution and ? ? ???????? an average over the sample data. The energy function of a RBM is given by: ( 9) and (10) Unfortunately, computing the likelihood needs to compute the partition function, , that is usually intractable. However, Hinton [28] proposed an alternative learning technique called Contrastive Divergence (CD). This learning algorithm is based on the consideration that minimizing the energy of the network is equivalent to minimize the distance between the data and a statistical generative model of it. A comparison is made between the statistics of the data and the statistics of its representation generated by Year 2017 ( ) F hidden units and biases a set of visible units v to a set of hidden units h [27]. For a standard RBM, a joint configuration of the binary visible units and the binary hidden units has an energy function, given by: Gibbs sampling. Hinton [36] showed that usually only a few steps of Gibbs sampling (most of the time reduced to one) are sufficient to ensure convergence. For a RBM,


# 1) Gaussian-Bernoulli Restricted Boltzmann Machines

2) Learning RBM Parameters the weights of the network can be updated using the following equation: (11) where ?? is the learning rate, ?? 0 corresponds to the initial data distribution, ? 0 is computed using equation 4, ?? ?? is sampled using the Gaussian distribution in equation 6 and with n full steps of Gibbs sampling, and ? ?? is again computed from equation 4.

A DBN is a stack of RBMs trained in a greedy layer-wise and bottom-up fashion introduced by [34]. The first model parameters are learned by training the first RBM layer using the contrastive divergence. Then, the model parameters are frozen and the conditional probabilities of the first hidden unit values are used to generate the data to train the higher RBM layers. The process is repeated across the layers to obtain a sparse representation of the initial data that will be used as the final output.


# d) Gesture Recognition

Assuming that the non-linear transform operated by DBN improves the linear separability of the data, a simple regression method will be used to perform the classification process. To express the final result as a probability that a given sign means one thing, we normalize the output with a softmax regression method. According to maximum likelihood principles, the largest probability value gives the decision of the system. The classification process will also be investigated using a more sophisticated classifier, a SVM classification method instead of softmax regression. In case of comparable results; this will underline that the DBN computes a linear separable signature of the initial data.


# IV. Experimental Results

For this task, we have conducted an experiment using the pre-processed dataset (the tiny-normalized dataset) which are randomly sampled from the Arabic Alphabet dataset which contains 28 letters. A complete structure (1024-1024) of the first RBM layer was used for this case. Figure shows features extracted using the locally normalized data. These features remain sparse but cover a broader spectrum of spatial frequencies. An interesting observation is that they look closer to the ones obtained with convolutional networks [40] for which no whitening is applied to the initial dataset.

The features shown in figure 4 have been extracted by training the first RBM layer on 6000 normalized image patches (32x32 pixels) sampled from the Arabic Alphabet database. One can see that the extracted features represent most of the 28 signs of the letters. Some others are localized and correspond to small parts of the initial views, like edges and corners that can be identified as hand elements (i.e. they are not specific of a given sign). These features can thus be used to code the initial data to achieve the linear separability, which will greatly simplifies the recognition process.  


# V. Conclusions and Future Works

The aim of this paper is therefore to propose to use DBNs coupled with tiny images in challenging image recognition task, view-based ArASLR. The expected results should demonstrate that an approach based on tiny images followed by a projection onto an appropriate feature space can achieve interesting classification results in an ArASLR task. Our hope is to get comparable results or even to outperform the results obtained in [10], [24] based on more complex techniques. In case of comparable results, this paper is thus offer a simpler alternative to the method recently proposed in [10], [24] based on cue integration and the computation of a confidence criterion in a HMM or a SVM classification approach.

Our future work is to empirically investigate the proposed model to achieve Arabic sign language alphabet recognition. The first step is to code the initial dataset using the extracted features. Assuming that the non-linear transform operated by DBN improves the linear separability of the data, a simple regression method will be used to perform the classification process. The classification process will also be examined using a sophisticated classification techniques like SVM in order to investigate whether the linear separability is gained by DBN or not.

After investigating the classification results of the system, this research can be extended to investigate the recognition of further deaf sign groups, such as Arabic numbers, basic Arabic words. Also, this system could be developed to be provided as a web service used in the field of conferences and meetings attended by deaf people. Finally, it can be used in intelligent classrooms and intelligent environments for real time translation for sign language.
12![Figure 1: Proposed model](image-2.png "Figure 1 :FFigure 2 :")
3![Figure 3: Stacking Restricted Boltzmann Machines (RBM) to achieve Deep Belief Network. This figure also illustrates the layer-wise training of a DBN](image-3.png "Figure 3 :")
713![Arabic Alphabet and Numbers Sign Language Recognition Global Journal of Computer Science and Technology Volume XVII Issue II Version I Global Journa ls Inc. (US) Layerwise Training for Deep Belief Networks](image-4.png "Towards 7 1 F 3 )")
4![Figure 4: Learned over-complete natural image bases. Sample of the 1024 features learned by training the first RBM layer on normalized image patches (32x32) sampled randomly from gesture dataset. For this experiment, the training protocol is similar to the one proposed in [40] (300 epochs, a mini-batch size of 200, a learning rate of 0:02, an initial momentum of 0:5, a final momentum of 0:9, a weight decay of 0:0002, a sparsity target of 0:02, and a sparsity cost of 0:02).](image-5.png "Figure 4 :")
			© 2017 Global Journals Inc. (US)
			© 20 7 Global Journa ls Inc. (US)
		
		
* 
	
		Hidden markov model for sign language recognition: A review
		
			NPashaloudi
		
		
			KGMargaritis
		
	
		Proc. 2nd Hellenic Conf. AI, SETN-2002
				2nd Hellenic Conf. AI, SETN-2002Thessaloniki, Greece
		
			IEEE
			Apr. 1112, 2002
			343354
		
	
* 
	
		A survey of glove-based systems and their applications
		
			LDipietro
		
		
			AMSabatini
		
		
			PDario
		
	
		Systems, Man, and Cybernetics, Part C: Applications and Reviews
		
			38
			4
			
			2008
		
	
	IEEE Transactions on


* 
	
		Hmm based hand gesture recognition: A review on techniques and approaches
		
			MMoni
		
	
		Computer Science and Information Technology
				
			2009. 2009
			
		
	ICCSIT 2009. 2nd IEEE


* 
	
		A survey on sign language recognition
		
			SKausar
		
		
			MY
		
	
		Frontiers of Information Technology (FIT)
				
			IEEE
			2011. 2011
			
		
* 
	
		Signs world; deeping into the silence world and hearing its signs (state of the art)
		
			HKRiad
		
		
			SElmonier
		
		
			AShohieb
		
		
			Asem
		
		arXiv:1203.4176
		
			2012
		
	
	arXiv preprint


* 
	
		Recent developments in sign language recognition: a review
		
			PKVijay
		
		
			NNSuhas
		
		
			CSChandrashekhar
		
		
			DKDhananjay
		
	
		Int J Adv Comput Eng Commun Technol
		
			1
			
			2012
		
	
* 
	
		Image-based and sensor based approaches to arabic sign language recognition
		
			MMohandes
		
		
			MDeriche
		
		
			JLiu
		
	
		IEEE Transactions on
		
			44
			4
			
			2014
		
	
	Human-Machine Systems


* 
	
		Sign language to voice recognition: hand detection techniques for vision-based approach
		
			NS MSalleh
		
		
			JJais
		
		
			LMazalan
		
		
			RIsmail
		
		
			SYussof
		
		
			AAhmad
		
		
			AAnuar
		
		
			DMohamad
		
	
		Current Developments in Technology-Assisted Education
		
			422
			2006
		
	
* 
	
		Development of a new Arabic sign language recognition using k-nearest neighbor algorithm
		
			RNaoum
		
		
			HHOwaied
		
		
			SJoudeh
		
	
		Journal of Emerging Trends in Computing and Information Sciences
		
			3
			8
			2012
		
	
* 
	
		Recognition of two-handed arabic signs using the cyberglove
		
			MAMohandes
		
	
		Arabian Journal for Science and Engineering
		
			38
			3
			
			2013
		
	
* 
	
		Pulsecoupled neural network feature generation model for arabic sign language recognition
		
			MSamirelons
		
		
			MFTolba
		
	
		IET Image Processing
		
			7
			9
			
			2013
		
	
* 
	
		Arabic sign language recognition by decisions fusion using dempster-shafer theory of evidence
		
			MMohandes
		
		
			MDeriche
		
	
		Computing, Communications and IT Applications Conference (ComComAp)
				
			IEEE
			2013. 2013
			
		
* 
	
		Low complexity classification system for glove-based arabic sign language recognition
		
			KAssaleh
		
		
			TShanableh
		
		
			MZourob
		
	
		Neural Information Processing
				
			Springer
			2012
			
		
* 
	
		Hand gesture recognition using modified 1$ and background subtraction algorithms
		
			HKhaled
		
		
			SGSayed
		
		
			ES MSaad
		
		
			HAli
		
	
		Mathematical Problems in Engineering
		
			2015
			2015
		
	
* 
	
		Arabic sign language recognition
		
			MMohandes
		
	
		International conference of imaging science, systems, and technology
				Las Vegas, Nevada, USA
		
			2001
			1
			
		
* 
	
		Recognition of gestures in arabic sign language using neuro-fuzzy systems
		
			OAl-Jarrah
		
		
			AHalawani
		
	
		Artificial Intelligence
		
			133
			1
			
			2001
		
	
* 
	
		Automatic recognition of arabic sign language finger spelling
		
			MAl-Rousan
		
		
			MHussain
		
	
		International Journal of Computers and Their Applications
		
			8
			
			2001
		
	
* 
	
		Improving gesture recognition in the arabic sign language using texture analysis
		
			OAl-Jarrah
		
		
			FAAl-Omari
		
	
		Applied Artificial Intelligence
		
			21
			1
			
			2007
		
	
* 
	
		Recognition of arabic sign language (arsl) using recurrent neural networks
		
			MMaraqa
		
		
			RAbu-Zaiter
		
	
		Applications of Digital Information and Web Technologies
				
			IEEE
			2008. 2008. 2008
			
		
	First International Conference on the


* 
	
		Recognition of Arabic sign language (arsl) using recurrent neural networks
		
			MMaraqa
		
		
			FAl-Zboun
		
		
			MDhyabat
		
		
			RAZitar
		
		
			2012
		
	
* 
	
		Edge-based recognizer for Arabic sign language alphabet (ars2v-arabic sign to voice)
		
			EEHemayed
		
		
			ASHassanien
		
	
		Computer Engineering Conference (ICENCO)
				
			2010
		
	
* 
	
		
		International. IEEE
		
			
			2010
		
	
* 
	
		Neutralizing lighting nonhomogeneity and background size in pcnn image signature for Arabic sign language recognition
		
			SElons
		
		
			M
		
		
			MTolba
		
	
		Neural Computing and Applications
		
			22
			1
			
			2013
		
	
* 
	
		Pulsecoupled neural network feature generation model for arabic sign language recognition
		
			MSamirelons
		
		
			MFTolba
		
	
		IET Image Processing
		
			7
			9
			
			2013
		
	
* 
	
		Arabic alphabet and numbers sign language recognition
		
			ZAMahmoud
		
		
			MHAlaa
		
		
			AE S-R
		
		
			MSSameh
		
		
			Elsayed
		
	
		International Journal of Advanced Computer Science and Applications (ijacsa)
		
			6
			3
			2015
		
	
* 
	
		Arabic sign language recognition using the leap motion controller
		
			MMohandes
		
		
			SAliyu
		
		
			MDeriche
		
	
		IEEE 23rd International Symposium on
				
			IEEE
			2014. 2014
			
		
	Industrial Electronics (ISIE)


* 
	
		Sparse coding of sensory inputs
		
			DJOlshausen
		
		
			Field
		
	
		Current opinion in neurobiology
		
			14
			4
			
			2004
		
	
* 
	
		What is the goal of sensory coding?
		
			JField
		
	
		Neural computation
		
			6
			4
			
			1994
		
	
* 
	
		Unsupervised learning of invariant feature hierarchies with applications to object recognition
		
			MARanzato
		
		
			FJHuang
		
		
			Y.-LBoureau
		
		
			YLecun
		
	
		Computer Vision and Pattern Recognition
				
			IEEE
			2007. 2007
			
		
* 
	
		Linear spatial pyramid matching using sparse coding for classification
		
			JYang
		
		
			KYu
		
		
			YGong
		
		
			THuang
		
	
		Computer Vision and Pattern Recognition
				
			2009. 2009. 2009
			
		
* 
	
		Sparse representation for computer vision and pattern recognition
		
			JWright
		
		
			YMa
		
		
			JMairal
		
		
			GSapiro
		
		
			TSHuang
		
		
			SYan
		
	
		Proceedings of the IEEE
		
			98
			6
			
			2010
		
	
* 
	
		Learning mid-level features for recognition
		
			Y.-LBoureau
		
		
			FBach
		
		
			YLecun
		
		
			JPonce
		
	
		Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on
				
			IEEE
			2010
			
		
* 
	
		Semantic place recognition based on deep belief networks and tiny images
		
			EHasasneh
		
		
			PFrenoux
		
		
			Tarroux
		
	
		ICINCO
				
			SciTePress
			2012
			
		
* 
	
		Learning sparse codes for image reconstruction
		
			KLabusch
		
		
			TMartinetz
		
	
		ESANN
				
			2010
		
	
* 
	
		A fast learning algorithm for deep belief nets
		
			GEHinton
		
		
			SOsindero
		
		
			Y.-WTeh
		
	
		Neural computation
		
			18
			7
			
			2006
		
	
* 
	
		Information processing in dynamical systems: Foundations of harmony theory
		
			PSmolensky
		
	
		Parallel Distributed Processing
				
			DEFoundations
			JLRumelhart
			Mcclelland
		
		Cambridge
		
			MIT Press
			1987
			1
			
		
* 
	
		Training products of experts by minimizing contrastive divergence
		
			GEHinton
		
		
		Neural computation
		
			14
			8
			
			2002. 2007
		
	
	The arabic dictionary of gestures for the deaf


* 
	
		Factored 3-way restricted Boltzmann machines for modeling natural images
		
			GEKrizhevsky
		
		
			Hinton
		
	
		International Conference on Artificial Intelligence and Statistics
				
			2010
			
		
* 
	
		Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations
		
			HLee
		
		
			RGrosse
		
		
			RRanganath
		
		
			AYNg
		
	
		Proceedings of the 26th Annual International Conference on Machine Learning
				the 26th Annual International Conference on Machine Learning
		
	
* 
	
		Stacks of convolutional restricted boltzmann machines for shift-invariant feature learning
		
			MNorouzi
		
		
			MRanjbar
		
		
			GMori
		
	
		Computer Vision and Pattern Recognition
				
			2009. 2009. 2009
			
		
* 
	
		Some informational aspects of visual perception
		
			FAttneave
		
	
		Psychological review
		
			61
			3
			183
			1954
		
	
* 
	
		Redundancy reduction revisited
		
			HBarlow
		
	
		Network: computation in neural systems
		
			12
			3
			
			2001
		
	
* 
	
		An analysis of singlelayer networks in unsupervised feature learning
		
			AYCoates
		
		
			HNg
		
		
			Lee
		
	
		International conference on artificial intelligence and statistics
				
			2011
			
		
* 
	
		Transforming autoencoders
		
			GEHinton
		
		
			AKrizhevsky
		
		
			SDWang
		
	
		Artificial Neural Networks and Machine Learning-ICANN 2011
				
			Springer
			2011
			
		
* 
	
		Deep belief networks
		
			GEHinton
		
	
		Scholarpedia
		
			4
			5
			5947
			2009
		
	
* 
	
		Learning multiple layers of features from tiny images
		
			Krizhevsky
		
		
			2009
		
	
	Tech. Rep.


* 
	
		A practical guide to training restricted boltzmann machines
		
			GHinton
		
	
		Momentum
		
			9
			1
			926
			2010
		
	
* 
	
		Small codes and large image databases for recognition
		
			ATorralba
		
		
			RFergus
		
		
			YWeiss
		
	
		Computer Vision and Pattern Recognition
				
			IEEE
			2008. 2008. 2008