# I. Introduction ign language continues to be the best method to communicate between the deaf and hearing impaired. Hand gestures enable communication between deaf people during their daily lives rather than speaking. In our society, Arabic Sign Language (ArSL) is only known for deaf people and specialists, thus the community of deaf people is narrow. To help people with normal hearing communicate effectively with the deaf and the hearing-impaired, numerous systems have been developed for translating diverse sign languages from around the world. Several review papers have been published that discuss such systems and they can be found in [1]- [7]. Generally, the process of ArSL recognition (ArSLR) can be achieved through two main phases: detection and classification. In stage one, each given image is pre-processed, improved, and then the regions of interest (ROI) is segmented using a segmentation algorithm. The output of the segmentation process can thus be used to perform the sign recognition process. Indeed, accuracy and speed of detection play an important role in obtaining accurate and fast recognition process. In the recognition stage, a set of features (patterns) for each segmented hand sign is first extracted and then used to recognize the sign. These features can be used as a reference to understand the differences among the classes. Recognizing and documenting of ArSL have only been paid attention recently, where few attempts have investigated and addressed this problem, see for example [8]- [11]. The question of ArSL recognition is therefore a major requirement for the future of ArSL. It facilitates the communication between the deaf and normal people by recognizing the alphabet and numbers signs of Arabic sign language to text or speech. To achieve that goal, this paper proposes a new Arabic sign recognition system based on new machine learning methods and a direct use of tiny images. The rest of the paper is organized as follows. Section2 presents the current approaches to Arabic alphabet sign language recognition (ArASLR). Section 3 describes the proposed model for ArASLR. Conclusions and future works are presented in section 4. # II. Current Approaches Studies in Arabic sign language recognition, although not as advanced as those devoted to other scripts (e.g. Latin), have recently shown interest [8]- [11]. We have also seen that current research in ArSLR has only been satisfactory for alphabet recognition with accuracy exceeding 98%. Isolate Arabic word recognition has only been successful with medium-size vocabularies (less than 300 signs). On the other hand, continuous ArSLR is still in its early stages, with very restrictive conditions. Current approaches on sign language recognition usually falls into two major approaches. The first one is sensors based approaches, which employs sensors attached to the glove. Look-up table software is usually provided with the glove to be used for hand gesture recognition. Recent sensors based approaches can be found, for instance, in [11]- [14]. The second approaches, vision-based analysis, are based on the use of video cameras to capture the movement of the hand that is sometimes aided by making the signer wear a glove that has painted areas indicating the positions of the fingers and the wrist then use those measurements in the recognition process. Image-based techniques exhibit a number of challenges. These include: lighting conditions, image background, face and hands segmentation, and different types of noise. Among of image-based approaches, some authors [15] introduced a method for automatic recognition of Arabic sign language alphabet. For feature extraction, Hus moments were used followed by support vector machines (SVMs) to perform the classification process. A correct recognition rate of 87% was achieved. Other authors in [16] developed a neurofuzzy system. The proposed system includes five main steps: image acquisition, filtering, segmentation, and hand outline detection, followed by feature extraction. Bare hands were considered in the experiments, achieving a recognition accuracy of 93.6%. In [17], the authors proposed an adaptive neuro-fuzzy inference system for alphabet sign recognition. A colored glove was used to simplify the segmentation process, and geometric features were extracted from the hand region. The recognition rate was improved to 95.5%. In [18], the authors developed an image-based ArSL system that does not use visual markings. The images of bare hands are processed to extract a set of features that are translation, rotation, and scaling invariant. A recognition accuracy of 97.5% was achieved on a database of 30 Arabic alphabet signs. In [19], the authors used recurrent neural networks for alphabet recognition. A database of 900 samples, covering 30 gestures performed by two signers, was used in their experiments. The Elman network achieved an accuracy rate of 89.7%, while a fully recurrent network improved the accuracy to 95.1%. The authors extended their work by considering the effect of different artificial neural network structures on the recognition accuracy. In particular, they extracted 30 features from colored gloves and achieved an overall recognition rate of 95% [20]. A recent paper reviews the different systems and methods for the automatic recognition of Arabic sign language can be found in [7]. It highlights the main challenges characterizing Arabic sign language as well as potential future research directions. Recent works on image-based recognition of Arabic sign language alphabet can be found in [9], [10], [21]- [25]. In particular, Naoum et al. [9] proposes an ArSLR using KNN. To achieve good recognition performance, they proposed to combine this algorithm with a glove based analysis technique. The system starts by finding histograms of the images. Profiles extracted from such histograms are then used as input to a KNN classifier. Mohandes [10] proposes a more sophisticated recognition algorithm to achieve high performance of ArSLR. The first attempt to recognize two-handed signs from the Unified Arabic Sign Language Dictionary using the CyberGlove and SVMs to perform the recognition process. PCA is used for feature extraction. The authors in [21] proposed an Arabic sign language alphabet recognition system that converts signs into voice. The technique is much closer to a real-life setup; however, recognition is not performed in real time. The system focuses on static and simple moving gestures. The inputs are color images of the gestures. To extract the skin blobs, the YCbCr space is used. The Prewitt edge detector is used to extract the hand shape. To convert the image area into feature vectors, principal component analysis (PCA) is used with a K-Nearest Neighbor Algorithm (KNN) in the classification stage. Furthermore, the authors in [22] and [23] proposed a pulse-coupled neural network (PCNN) ArSLR system able to compensate for lighting nonhomogeneity and background brightness. The proposed system showed invariance under geometrical transforms, bright background, and lighting conditions, achieving a recognition accuracy of 90%. Moreover, the authors in [24] introduced an Arabic Alphabet and Numbers Sign Language Recognition (ArANSLR). The phases of the proposed algorithm consists of skin detection, background exclusion, face and hands extraction, feature extraction, and also classification using Hidden Markov Model (HMM). The proposed algorithm divides the rectangle surrounding by the hand shape into zones. The best number of zones is 16 zones. The observation of HMM is created by sorting zone numbers in ascending order depending on the number of white pixels in each zone. Experimental results showed that the proposed algorithm achieves 100% recognition rate. On the other hand, new systems for facilitating human machine interaction have been introduced recently. In particular, the Microsoft Kinect and the leap motion controller (LMC) have attracted special attention. The Kinect system uses an infrared emitter and depth sensors, in addition to a high resolution video camera. The LMC uses two infrared cameras and three LEDs to capture information within its interaction range. However, the LMC does not provide images of detected objects. The LMC has recently been used for Arabic alphabet sign recognition with promising results [25]. After presenting the different existing imagebased approaches that have been used to achieve ArASLR, we have noted that these approaches generally include two main phases of coding and classification. We have also seen that most of the coding methods are based on hand-crafted feature extractors, which are empirical detectors. By contrast, a set of recent methods based on deep architectures of neural networks give the ability to build it from theoretical considerations. ArSLR therefore requires projecting images onto an appropriate feature space that allows an accurate and rapid classification. Contrarily to these empirical methods mentioned above, new machine learning methods have recently emerged which strongly related to the way natural systems code images [26]. These methods are based on the consideration that natural image statistics are not Gaussian as it would be if they have had a completely random structure [27]. The autosimilar structure of natural images allowed the evolution to build optimal codes. These codes are made of statistically independent features and many different methods have been proposed to construct them from image datasets. Imposing locality and sparsity constraints in these features is very important. This is probably due to the fact that any simple algorithms based on such constraints can achieve linear signatures similar to the notion of receptive field in natural systems. Recent years have seen an interesting interest in computer vision algorithms that rely on local sparse image representations, especially for the problems of image classification and object recognition [28]- [32]. Moreover, from a generative point of view, the effectiveness of local sparse coding, for instance for image reconstruction [33], is justified by the fact that an natural image can be reconstructed by a smallest possible number of features. It has been shown that Independent Component Analysis (ICA) produces localized features. Besides it is efficient for distributions with high kurtosis well representative of natural image statistics dominated by rare events like contours; however the method is linear and not recursive. These two limitations are released by DBNs [34] that introduce nonlinearities in the coding scheme and exhibit multiple layers. Each layer is made of a RBM, a simplified version of a Boltzmann machine proposed by Smolensky [35] and Hinton [36]. Each RBM is able to build a generative statistical model of its inputs using a relatively fast learning algorithm, Contrastive Divergence (CD), first introduced by Hinton [36]. Another important characteristic of the codes used in natural systems, the sparsity of the representation [26], is also achieved in DBNs. Moreover, it has been shown that these approaches remain robustness to extract local sparse efficient features from tiny images [37]. This model has been successfully used in [32] to achieve semantic place recognition. The hope is to demonstrate that DBNs coupled with tiny images can also be successfully used in the context of ArASLR. # III. Proposed Model The methodology of this research mainly includes four stages (see figure 1) which can be summarized as follows: 1) data collection and image acquisition, 2) image pre-processing, 3) feature extraction and finally 4) gesture recognition. # a) Description of the Database The alphabet used for Arabic sign language is displayed in Figure 2, left [38], will be used to investigate the performance of the proposed model. In this database, the signer performs each letter separately. Mostly, letters are represented by a static posture, and the vocabulary size is limited. In this section, several methods for image-based Arabic sign language alphabet recognition are discussed. Even though the Arabic alphabet only consists of 28 letters, Arabic sign language uses 39 signs. The 11 additional signs represent basic signs combining two letters. For example, the two letters ?"??"? are quite common in Arabic (similar to the article "the" in English). Therefore, most literature on ArASLR uses these basic 39 signs. # b) Image Pre-processing The typical input dimension for a DBN is approximately 1000 units (e.g. 30x30 pixels). Dealing with smaller patches could make the model unable to extract interesting features. Using larger patches can be extremely time-consuming during feature learning. Additionally the multiplication of the connexion weights acts negatively on the convergence of the CD algorithm. The question is therefore how could we scale the size of realistic images (e.g. 300x300 pixels) to make them appropriate for DBNs? alphabet. One can see that, despite the size reduction, these small images remain fully recognizable Three solutions can be envisioned. The first one is to select random patches from each image as done in [39], the second is the use of convolutional architectures, as proposed in [40], and the last one is to reduce the size of each image to a tiny image as proposed in [37]. The first solution extracts local features and the characterization of an image using these features can only be made using BoWs approaches we wanted to avoid. The second solution shows the same limitations as the first one and additionally gives raise to extensive computations that are only tractable on Graphics Processing Unit architectures. Features extraction using random patches is irrespective of the spatial structures of each image [41]. In the case of structured scenes like the ones used in semantic place recognition these structures bear an interesting information. Besides, tiny images have been successfully used in [37] for classifying and retrieving images from the 80-million images database developed at MIT. Torralba in [37] showed that the use of tiny images combined with a DBN approach led to code each image by a small binary vector defining the elements of a feature alphabet that can be used to optimally define the considered image. The binary vector acts as a bar-code while the alphabet of features is computed only once from a representative set of images. The power of this approach is well illustrated by the fact that a relatively small binary vector largely exceeds the number of images that have to be coded even in a huge database (2256 ? 1075). So, for all the se reasons we have chosen image reduction. On the other hand, natural images are highly structured and contain significant statistical redundancies, e.g. their pixels have strong correlations [42], [43]. Removing these correlations is known as whitening. It has been shown that whitening is a mandatory step for the use of clustering methods in object recognition [44]. Whitening being a linear process and it does not remove the higher order statistics present in the data. As a consequence, as proposed by [37] and [32], after color conversion and image cropping, the image size is reduced to 42x24 as shown in figure 1. The final set of tiny images is centered and whitened in order to eliminate order 2 statistics. Consequently the variance in equation 6 will be set to 1. Contrarily to [37], the 42x24 = 1008 pixels of the whitened images will be used directly as the input vector of the network for features extraction purpose. # c) Features Extraction Next the feature extraction stage comes. This stage is the most significant stage which is based on using a new unsupervised machine learning model DBNs. DBNs are probabilistic generative models composed of multiple RBMs layers of latent stochastic variables. The latent variables typically have binary values. They correspond to hidden units or feature detectors. The input variables are zero-mean Gaussian activation units and are often used to reconstruct the visible units. As shown in figure 3, the top two layers have undirected, symmetric connections between them and they form the weights or the features. These features are extracted using the principle of energy function minimization according to the quality of the image reconstruction. It has been shown that features extracted by DBNs are more promising for image classification than hand-engineered features [32], [45], [46]. So, we hope that, due to the statistical independence of the features and their sparse nature, learning in the feature space will become linearly independent, greatly simplifying the way we will learn to classify the signs. Probabilities of the state for a unit in one layer conditional to the state of the other layer can therefore be easily computed. According to Gibbs distribution: (2) where is a normalizing constant. Thus after marginalization: (3) it can be derived [47] that the conditional probabilities of a standard RBM are given as follows: (4) (5) where is the logistic function. Since binary units are not appropriate for multivalued inputs like pixel levels, as suggested by Hinton [48], in the present work visible units have a zero-means Gaussian activation scheme: (6) In this case, the energy function of Gaussian-Bernoulli RBM is given by: (7) One way to learn RBM parameters is through the maximization of the model log likelihood in a gradient ascent procedure. The partial derivative of the log-likelihood for an energy-based model can be expressed as follows: (8) where ? ? ?????????? is an average with respect to the model distribution and ? ? ???????? an average over the sample data. The energy function of a RBM is given by: ( 9) and (10) Unfortunately, computing the likelihood needs to compute the partition function, , that is usually intractable. However, Hinton [28] proposed an alternative learning technique called Contrastive Divergence (CD). This learning algorithm is based on the consideration that minimizing the energy of the network is equivalent to minimize the distance between the data and a statistical generative model of it. A comparison is made between the statistics of the data and the statistics of its representation generated by Year 2017 ( ) F hidden units and biases a set of visible units v to a set of hidden units h [27]. For a standard RBM, a joint configuration of the binary visible units and the binary hidden units has an energy function, given by: Gibbs sampling. Hinton [36] showed that usually only a few steps of Gibbs sampling (most of the time reduced to one) are sufficient to ensure convergence. For a RBM, # 1) Gaussian-Bernoulli Restricted Boltzmann Machines 2) Learning RBM Parameters the weights of the network can be updated using the following equation: (11) where ?? is the learning rate, ?? 0 corresponds to the initial data distribution, ? 0 is computed using equation 4, ?? ?? is sampled using the Gaussian distribution in equation 6 and with n full steps of Gibbs sampling, and ? ?? is again computed from equation 4. A DBN is a stack of RBMs trained in a greedy layer-wise and bottom-up fashion introduced by [34]. The first model parameters are learned by training the first RBM layer using the contrastive divergence. Then, the model parameters are frozen and the conditional probabilities of the first hidden unit values are used to generate the data to train the higher RBM layers. The process is repeated across the layers to obtain a sparse representation of the initial data that will be used as the final output. # d) Gesture Recognition Assuming that the non-linear transform operated by DBN improves the linear separability of the data, a simple regression method will be used to perform the classification process. To express the final result as a probability that a given sign means one thing, we normalize the output with a softmax regression method. According to maximum likelihood principles, the largest probability value gives the decision of the system. The classification process will also be investigated using a more sophisticated classifier, a SVM classification method instead of softmax regression. In case of comparable results; this will underline that the DBN computes a linear separable signature of the initial data. # IV. Experimental Results For this task, we have conducted an experiment using the pre-processed dataset (the tiny-normalized dataset) which are randomly sampled from the Arabic Alphabet dataset which contains 28 letters. A complete structure (1024-1024) of the first RBM layer was used for this case. Figure shows features extracted using the locally normalized data. These features remain sparse but cover a broader spectrum of spatial frequencies. An interesting observation is that they look closer to the ones obtained with convolutional networks [40] for which no whitening is applied to the initial dataset. The features shown in figure 4 have been extracted by training the first RBM layer on 6000 normalized image patches (32x32 pixels) sampled from the Arabic Alphabet database. One can see that the extracted features represent most of the 28 signs of the letters. Some others are localized and correspond to small parts of the initial views, like edges and corners that can be identified as hand elements (i.e. they are not specific of a given sign). These features can thus be used to code the initial data to achieve the linear separability, which will greatly simplifies the recognition process. # V. Conclusions and Future Works The aim of this paper is therefore to propose to use DBNs coupled with tiny images in challenging image recognition task, view-based ArASLR. The expected results should demonstrate that an approach based on tiny images followed by a projection onto an appropriate feature space can achieve interesting classification results in an ArASLR task. Our hope is to get comparable results or even to outperform the results obtained in [10], [24] based on more complex techniques. In case of comparable results, this paper is thus offer a simpler alternative to the method recently proposed in [10], [24] based on cue integration and the computation of a confidence criterion in a HMM or a SVM classification approach. Our future work is to empirically investigate the proposed model to achieve Arabic sign language alphabet recognition. The first step is to code the initial dataset using the extracted features. Assuming that the non-linear transform operated by DBN improves the linear separability of the data, a simple regression method will be used to perform the classification process. The classification process will also be examined using a sophisticated classification techniques like SVM in order to investigate whether the linear separability is gained by DBN or not. After investigating the classification results of the system, this research can be extended to investigate the recognition of further deaf sign groups, such as Arabic numbers, basic Arabic words. Also, this system could be developed to be provided as a web service used in the field of conferences and meetings attended by deaf people. Finally, it can be used in intelligent classrooms and intelligent environments for real time translation for sign language. 12![Figure 1: Proposed model](image-2.png "Figure 1 :FFigure 2 :") 3![Figure 3: Stacking Restricted Boltzmann Machines (RBM) to achieve Deep Belief Network. This figure also illustrates the layer-wise training of a DBN](image-3.png "Figure 3 :") 713![Arabic Alphabet and Numbers Sign Language Recognition Global Journal of Computer Science and Technology Volume XVII Issue II Version I Global Journa ls Inc. (US) Layerwise Training for Deep Belief Networks](image-4.png "Towards 7 1 F 3 )") 4![Figure 4: Learned over-complete natural image bases. Sample of the 1024 features learned by training the first RBM layer on normalized image patches (32x32) sampled randomly from gesture dataset. For this experiment, the training protocol is similar to the one proposed in [40] (300 epochs, a mini-batch size of 200, a learning rate of 0:02, an initial momentum of 0:5, a final momentum of 0:9, a weight decay of 0:0002, a sparsity target of 0:02, and a sparsity cost of 0:02).](image-5.png "Figure 4 :") © 2017 Global Journals Inc. (US) © 20 7 Global Journa ls Inc. (US) * Hidden markov model for sign language recognition: A review NPashaloudi KGMargaritis Proc. 2nd Hellenic Conf. AI, SETN-2002 2nd Hellenic Conf. AI, SETN-2002Thessaloniki, Greece IEEE Apr. 1112, 2002 343354 * A survey of glove-based systems and their applications LDipietro AMSabatini PDario Systems, Man, and Cybernetics, Part C: Applications and Reviews 38 4 2008 IEEE Transactions on * Hmm based hand gesture recognition: A review on techniques and approaches MMoni Computer Science and Information Technology 2009. 2009 ICCSIT 2009. 2nd IEEE * A survey on sign language recognition SKausar MY Frontiers of Information Technology (FIT) IEEE 2011. 2011 * Signs world; deeping into the silence world and hearing its signs (state of the art) HKRiad SElmonier AShohieb Asem arXiv:1203.4176 2012 arXiv preprint * Recent developments in sign language recognition: a review PKVijay NNSuhas CSChandrashekhar DKDhananjay Int J Adv Comput Eng Commun Technol 1 2012 * Image-based and sensor based approaches to arabic sign language recognition MMohandes MDeriche JLiu IEEE Transactions on 44 4 2014 Human-Machine Systems * Sign language to voice recognition: hand detection techniques for vision-based approach NS MSalleh JJais LMazalan RIsmail SYussof AAhmad AAnuar DMohamad Current Developments in Technology-Assisted Education 422 2006 * Development of a new Arabic sign language recognition using k-nearest neighbor algorithm RNaoum HHOwaied SJoudeh Journal of Emerging Trends in Computing and Information Sciences 3 8 2012 * Recognition of two-handed arabic signs using the cyberglove MAMohandes Arabian Journal for Science and Engineering 38 3 2013 * Pulsecoupled neural network feature generation model for arabic sign language recognition MSamirelons MFTolba IET Image Processing 7 9 2013 * Arabic sign language recognition by decisions fusion using dempster-shafer theory of evidence MMohandes MDeriche Computing, Communications and IT Applications Conference (ComComAp) IEEE 2013. 2013 * Low complexity classification system for glove-based arabic sign language recognition KAssaleh TShanableh MZourob Neural Information Processing Springer 2012 * Hand gesture recognition using modified 1$ and background subtraction algorithms HKhaled SGSayed ES MSaad HAli Mathematical Problems in Engineering 2015 2015 * Arabic sign language recognition MMohandes International conference of imaging science, systems, and technology Las Vegas, Nevada, USA 2001 1 * Recognition of gestures in arabic sign language using neuro-fuzzy systems OAl-Jarrah AHalawani Artificial Intelligence 133 1 2001 * Automatic recognition of arabic sign language finger spelling MAl-Rousan MHussain International Journal of Computers and Their Applications 8 2001 * Improving gesture recognition in the arabic sign language using texture analysis OAl-Jarrah FAAl-Omari Applied Artificial Intelligence 21 1 2007 * Recognition of arabic sign language (arsl) using recurrent neural networks MMaraqa RAbu-Zaiter Applications of Digital Information and Web Technologies IEEE 2008. 2008. 2008 First International Conference on the * Recognition of Arabic sign language (arsl) using recurrent neural networks MMaraqa FAl-Zboun MDhyabat RAZitar 2012 * Edge-based recognizer for Arabic sign language alphabet (ars2v-arabic sign to voice) EEHemayed ASHassanien Computer Engineering Conference (ICENCO) 2010 * International. IEEE 2010 * Neutralizing lighting nonhomogeneity and background size in pcnn image signature for Arabic sign language recognition SElons M MTolba Neural Computing and Applications 22 1 2013 * Pulsecoupled neural network feature generation model for arabic sign language recognition MSamirelons MFTolba IET Image Processing 7 9 2013 * Arabic alphabet and numbers sign language recognition ZAMahmoud MHAlaa AE S-R MSSameh Elsayed International Journal of Advanced Computer Science and Applications (ijacsa) 6 3 2015 * Arabic sign language recognition using the leap motion controller MMohandes SAliyu MDeriche IEEE 23rd International Symposium on IEEE 2014. 2014 Industrial Electronics (ISIE) * Sparse coding of sensory inputs DJOlshausen Field Current opinion in neurobiology 14 4 2004 * What is the goal of sensory coding? JField Neural computation 6 4 1994 * Unsupervised learning of invariant feature hierarchies with applications to object recognition MARanzato FJHuang Y.-LBoureau YLecun Computer Vision and Pattern Recognition IEEE 2007. 2007 * Linear spatial pyramid matching using sparse coding for classification JYang KYu YGong THuang Computer Vision and Pattern Recognition 2009. 2009. 2009 * Sparse representation for computer vision and pattern recognition JWright YMa JMairal GSapiro TSHuang SYan Proceedings of the IEEE 98 6 2010 * Learning mid-level features for recognition Y.-LBoureau FBach YLecun JPonce Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on IEEE 2010 * Semantic place recognition based on deep belief networks and tiny images EHasasneh PFrenoux Tarroux ICINCO SciTePress 2012 * Learning sparse codes for image reconstruction KLabusch TMartinetz ESANN 2010 * A fast learning algorithm for deep belief nets GEHinton SOsindero Y.-WTeh Neural computation 18 7 2006 * Information processing in dynamical systems: Foundations of harmony theory PSmolensky Parallel Distributed Processing DEFoundations JLRumelhart Mcclelland Cambridge MIT Press 1987 1 * Training products of experts by minimizing contrastive divergence GEHinton Neural computation 14 8 2002. 2007 The arabic dictionary of gestures for the deaf * Factored 3-way restricted Boltzmann machines for modeling natural images GEKrizhevsky Hinton International Conference on Artificial Intelligence and Statistics 2010 * Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations HLee RGrosse RRanganath AYNg Proceedings of the 26th Annual International Conference on Machine Learning the 26th Annual International Conference on Machine Learning * Stacks of convolutional restricted boltzmann machines for shift-invariant feature learning MNorouzi MRanjbar GMori Computer Vision and Pattern Recognition 2009. 2009. 2009 * Some informational aspects of visual perception FAttneave Psychological review 61 3 183 1954 * Redundancy reduction revisited HBarlow Network: computation in neural systems 12 3 2001 * An analysis of singlelayer networks in unsupervised feature learning AYCoates HNg Lee International conference on artificial intelligence and statistics 2011 * Transforming autoencoders GEHinton AKrizhevsky SDWang Artificial Neural Networks and Machine Learning-ICANN 2011 Springer 2011 * Deep belief networks GEHinton Scholarpedia 4 5 5947 2009 * Learning multiple layers of features from tiny images Krizhevsky 2009 Tech. Rep. * A practical guide to training restricted boltzmann machines GHinton Momentum 9 1 926 2010 * Small codes and large image databases for recognition ATorralba RFergus YWeiss Computer Vision and Pattern Recognition IEEE 2008. 2008. 2008