utomatic food identification and calorie estimation become an important issue in last few years because of the negative impact of obesity in our health.
Obesity may cause cardiovascular diseases, diabetes mellitus type 2, obstructive sleep apnea, cancer, osteoarthritis, asthma, etc.
[1] Researchers said that junk foods and processed foods are responsible for increasing the childhood obesity [2]. Eating extra calories can harm the healthy production and functioning of the synapses of our brain. Fried chicken, pizza, burger, etc. are favorite fast food for both child and adults. People often buy these high-calorie foods to control their appetite especially when they are busy and unable to take their meal in time. Today's people are more conscious about their health issues and try to maintain a healthy diet. Due to the availability of smart phone and computer-aided object recognition techniques become more popular for dietary assessment. Although the identification of food and estimation of its calorie is a very challenging task but many effective steps already have taken in this regards. We also propose an easy but more effective calorie measurement technique that helps people to identify the amount of junk food and snacks they can intake as well as to decide whether the food is harmful or not good for their health. We use both PFID datasets and our own A data sets and apply deep neural network with SVM classifier. Deep learning neural networks have multilayer structure which can easily extract complicated features from input images and supervised learning classifier SVM can efficiently perform a non-linear classification [3]. Our experimental result shows the better performance of CNN with a higher accuracy rate.
Obesity is conceding a great problem in today's life. The preeminent reason of obesity is consuming more calories than we burn which can seriously undermine the quality of life. Researchers says, accurately assessing dietary intake is an important factor to reduce this risk. To meet this exigency, researches have taken some approaches to measure the calorie of a food. In 2009, an extensive food image and video dataset was built named the Pittsburgh Fast-Food Image Dataset (PFID), containing 4545 still images of 101 different food items, such as "chicken nuggets" and "cheese pizza" etc. [1]. The researcher applied Support Vector Machine (SVM) classifier on this dataset and achieved a classification accuracy of 11% with the color histogram method and 24% with the bag-of-SIFT (Scale-Invariant Feature Transform)-features method [2]. 78.77% for UEC-FOOD100 and 67.57% for UEC-FOOD256 dataset [5]. Kagaya et al., applied CNN on their own dataset for the identification and recognition of the food item. CNN provide higher accuracy than traditional support-vector-machine-based methods where the accuracy rate for recognition was 73.70% and for detection was 93.80% [6]. In 2016, Hassannejad et al. proposed a deep convolutional neural network (DCNN) technique having a depth of 54 layers on UEC FOOD 100, ETH Food-101 and UEC FOOD 256 dataset and achieved 88.28%, 76.17% and 81.45% as top-1 accuracy and 97.27%, 96.88% and 92.58% as top-5 accuracy for dietary assessment [7].Christodoulidis et al. applied a 6-layer deep convolutional neural network on their own dataset containing 573 food items to classify food and the accuracy rate was 84.9% [8]. In 2016, Singla et al., proposed a new method of identifying food/non-food items and recognizing food category successfully using a GoogLeNet model based on deep convolutional neural network. According to their experimental results they achieved a high accuracy rate of 99.2% in food/non-food item classification and 83.6% in food item recognition [9]. Liu et al. [10], propose a new Convolutional Neural Network (CNN)-based food image recognition algorithm and applied it on UEC-256 and Food-101 data sets and achieved 87.2% and 94.8% accuracy respectively. In [11], a five-layer CNN with bag-of-features (BoF) and support vector machine was applied on a dataset containing 5822 images of ten categories and the overall accuracy of 56%. After that researcher applied Data expansion techniques to increase the size of training images for which the accuracy was increased by 90%.
Due to the complexity of food images, many of the previously-proposed methods for food recognition achieved low classification accuracy. In our proposed system we used two training data sets one is publicly available PFID data set another is manually created by us with images captured by smart phone or camera. We use Support Vector Machine (SVM) classifier with a trained CNN to extract and to classify fast food images of ten different classes and achieved accuracy 99.5 III.
Two benchmark datasets such as Pittsburg Fast-food Image Dataset (PFID) and Food-101 Dataset images are used in this paper to evaluate the accuracy of food recognition. The PFID collection is proposed by Chen et al. is used to measure the accuracy of recognition algorithms consists of 4,545 still images is divided into 101 categories of standard computer vision approach. This dataset of foods each of which is categorized into three instances. For each categories of foods both images and videos are captured in both restaurant conditions and a controlled lab setting. Each instance of each food has four still images in restaurant environment, six still images in the laboratory setting. In Food-101, a challenging data set of 101 food categories, with 101000 real world images in total are introduced. It includes very diverse but also visually and semantically similar food classes where each class consists of 1000 of image among which 250 are manually reviewed test images and 750 are training images.
IV.
At the very beginning of our experimental method, it is very important to do several preprocessing to make the images ready for work properly. The contamination of digital image by salt-andpepper noise is largely caused by error in image acquisition. Thus, noise reduction is essential for the accuracy of further processing. In salt-and-pepper noise a certain percentage of individual pixels in digital image are randomly digitized into two extreme intensities. To remove this kind of noise effectively we use a non-linear median filter which can remove salt and pepper noise without significantly reducing the sharpness of an image.
We have split up the entire dataset into two subsets namely the training set and validation or testing dataset. 30% images were randomly selected for training dataset and the remainder 70% images for test datasets. Our data set is contrived by ten different types of fast food such as chicken wings, chocolate cake, icecream, French fries, pizza, hamburger etc. To perform this experiment, we use 1000 images for each categories of food. The training set contains 750 images and testing set contains 250 images for each of the food category. We have trained our classifier engine by using a pertrained CNN as a feature extractor. Some sample images from training dataset are given below: We have manipulate this pretrained CNN by changing the initial learning rate lower than the default and the maximum number of stages to 20 for preventing it from over fitting our data. The following figure represent the performance of this fine-tuned network on our data:
The deeper network layer then further process these unrefined features extracted by the first layer and create a richer imager feature representation. These higher level features are more suitable for a recognition task than the first one [15].The easiest way to extract deeper layer features using the activations method in matlab.
In this step extracted CNN features are used to train a multiclass SVM classifier. At the very beginning SVM were designed for binary classification which separates the binary classes (k = 2) with a maximized margin criterion [16]. But real life problems sometimes require the classification for more than two categories. These type of problems can be solved by the construction of multiclass SVMs, where we create a twoclass classifier over a feature vector ?(?? ????,y) obtained from the pair consisting of the input features and the class of the data. During the test, the classifier chooses the class, y=argmax y' ?? ??? T ?(?? ????,y ' )
The margin during training is the gap between this value for the correct class and for the nearest other class, and so the quadratic program formulation will require that,
? ?? ? ?? ? ?? ?? ?? ??? T ?(?? ? ?? , ?? ?? ) -?? ????? T ?(?? ? ?? , ??) ? 1 ? ? ?? ?(2)This general method can be extended to give a multiclass formulation of various kinds of linear classifiers [17].In this work a fast Stochastic Gradient Descent solver is used for training by setting the fitcecoc function's 'Learners' parameter to 'Linear' because this algorithm is specially suitable when training data size is huge. This helps speed-up the training when working with high-dimensional CNN feature vectors [18]. When training deep learning models, the objective function is considered as a sum of a finite number of functions:
ð??"ð??"(??) = 1 ?? + ? ð??"ð??" ?? ?? ??=1 (x)(3)Where fi(x) is a loss function depending on the training data instance indexed by i. It is important to highlight that the per-iteration computational cost in gradient descent scales linearly with the training data set size n. Hence, when n is huge, the per-iteration computational cost of gradient descent is very high. [19].
To evaluate the trained classifier, first of all we extract the CNN features from the images of our test set. These test features are then passed to the classifier to calculate the accuracy of the trained classifier.
V.
Our proposed system creates a classifier depending on the extracted features of CNN for identification of the object. The obtained success rate of recognition and classification has been represented using a confusion matrix. A confusion matrix also called an error matrix is a contingency table that comprise of the information about actual and predicted classifications done by a classification system. The entries in the matrix are True Positive (TP) rate, True Negative (TN) rate, False Positive (FP) rate, False Negative (FN) rate for each type of dataset. The accuracy (AC) is the ratio of the total number of predictions that were correct. It is derived by the equation:
Accuracy = ????+???? ????+????+????+???? (4)The confusion matrix shows that we get different accuracy but very closer via the same algorithm.We got 99.13% accuracy for Barfood 101 dataset whereas we achieved around 95.79% accuracy over PFID dataset which is higher than the accuracy obtained with Bag of SIFT or Bag of Surf (94%).
In this paper, we proposed a method to classify and to identify high calorie snacks (such as burger, pizza etc.) from the test image to measure the amount of calories has taken. In our experiment we apply CNN in PFID dataset that provides the accuracy 94% which is better than BOF. Also the false positive rate is not so high. People today are very conscious about their health. So, along with the patient, the health conscious person who has a major effect of food calories can be benefitted with this approach. In future, we will try to improve the accuracy by building a robust system which will identify all kinds of snacks more accurately.
Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 2012.
Food/Non-Food Image Classification and Food Categorization using Pre-Trained GoogLeNet Model. Proceedings of the MADiMa'16, (the MADiMa'16Amsterdam, The Netherlands
Support vector networks. Mach. Learn 1995. 20 (3) p. .
Deep Food: Deep Learning-Based Food Image Recognition for Computer-Aided Dietary Assessment. Proceedings of the ICOST 2016, (the ICOST 2016Wuhan, China
Object Recognition from Local ScaleInvariant Features. Proceedings of the ICCV'99, (the ICCV'99Corfu, Greece
Food recognition for dietary assessment using deep convolutional neural networks. Proceedings of the 2nd International Workshop on Multimedia Assisted Dietary Management, (the 2nd International Workshop on Multimedia Assisted Dietary ManagementCham
Food Detection and Recognition using Convolutional Neural Network. Proceedings of the MM'14, (the MM'14Orlando, FL, USA
The effect of obesity on health outcomes. Molecular and cellular endocrinology 2010. 316 (2) p. .
Food image recognition using deep convolutional network with pre-training and fine-tuning. 2015 IEEE International Conference on, 2015. June. IEEE. p. . (Multimedia & Expo Workshops)
Pittsburgh Fast Food Image Dataset. Proceedings of the ICIP 2009, (the ICIP 2009Cairo, Egypt, 7-
Automatic chinese food identification and quantity estimation. SIGGRAPH Asia 2012 Technical Briefs, 2012. November. ACM. p. 29.
Dietary assessment on a mobile phone using image processing and pattern recognition techniques: Algorithm design and system prototyping. Nutrients 2015. 7 (8) p. .