# Introduction ndroid is one of the most used Smartphone's operating System in the World (Srikanth, 2012). Android is open source with huge user community and documentations as a result of these, it allows any programmer to develop and publish Applications to both the Official or Unofficial market. There are over seven hundred thousand Applications published via the Official Android market, the Google Play Store (Zack, 2012). Malware attack is a challenging issue among the Android user community. This is due to its open source and a very huge adoption and market penetration, making it a target for most malware developers. Android is predicted to be the most used mobile Smartphone platform by 2014 (You, Daeyeol, Hyung-Woo, Jae &Jeong,2014) which has become a reality. This ubiquitous gains of Android brings along with it security risks in terms of malware attacks targeted at this platform. It therefore becomes necessary to make the platform safe for users by providing defence mechanism especially against malware. There are basically three approaches according to (Burquera, Zurutuza&Nadjm-Tehrani,2011);(Aswathy, 2013); (Lovi&Divya, 2014) to mobile malware detection approaches; static, dynamic and manifest file analyses. While Static analysis focused on the use of patterns of strings called signatures to detect malware presence, dynamic analysis approach to malware detection uses the behaviour pattern of Applications while in execution. The third approach involves the analysis of Android Manifest file. This paper presents a model for mining Applications behaviours for detecting malware on the Android platform using dynamic analysis. The malware detector attempts to help protect the system by detecting malicious behaviour (Aswathy, 2013). The malware detector performs its protection through the manifested malware detection Approaches.Detection methods for attacks on mobile devices (Burquera, Zurutuza&Nadjm-Tehrani2011);(Wei, Mao, Jeng, Lee, Wang& Wu, 2012); (Wu, Mao, Wei, Lee & Wu, 2012);(Ham, Choi, Lee, Lim & Kim, 2012) have been proposed to reduce the damage from the distribution of malicious applications. However, a mechanism that provides more accurate ways of determining normal applications and malicious applications on Android mobile devices must be developed and a procedure for obtaining the features well defined. This paper developed a model for extracting Android application behaviours through events of normal applications and malicious applications, using a customized approach. The research employs Anomaly-based detection in a host-based manner to monitor activity that occurs on the target host system. This system is capable of monitoring features of the Android system such as calls received, calls initiated, system calls invoked by running applications, Short Messaging classifier, it is fundamentally important to first and foremost collect relevant features. This is most important in the field of dynamic analysis approach to anomaly malware detection systems. In this approach, the behaviour patterns of applications while in execution are analysed. The behaviour features that Android as a system allows access permissions to depend on the type of device; either rooted or not. Android is based on the Linux kernel at the bottom layer, all layers on top of the kernel run without privileged mode. Thus, if a behaviour feature vector is created from features of Android (Application Programming Interface) API in unrooted mode, then only system information made available by Android can be used. In this paper, a Device Monitoring system for an unrooted device is developed and used to collect Android application data. The application data is used to build feature vectors that describes the Android application behaviour for Anomaly malware detection. This application is able to collect essential information from Android application such as installed applications and services running within the device before or after the Monitoring application was started, the date/time stamp, calls initiated from the device, calls received by the device, sent short message services (SMSs), SMSs received, and the status of the device as at when the event took place. This information is logged in a comma separated value (.csv) file format and stored on the SDcard of the device. The .csv file is converted to attribute relation file format (.arff); the format acceptable by WEKA machine learning tool. This. arff file of feature vectors is then used as input to the Classifier in the Android malware detection system. Services (SMSs) received, SMSs sent and screen status of the target device. Anomaly-based detection systems use a prior training phase to establish a normality model for the system activity. In this method of detection, the detection system is first trained on the normal behaviour of the application or target system to be monitored. Using this normality model of behaviour, it becomes possible to detect anomalous activities by looking for abnormal behaviour or activities that deviate from the defined normal behaviour occurring in the system. Though this technique look more complex, it has the advantage of being able to detect new and unknown malware attacks. Anomaly-based detection requires the use of feature vectors to train the classifier before subsequent classification can be carried out. These feature vectors are obtained from features or data collected from the system. The objective of this work is to extract Android applications data from an unrooted android device and using them to effectively describe the system behaviour. The structure of this paper is given as follows: section one provides a brief introduction; section two gives related literatures; section three discuss the Experimental procedures and setup; section four provides the discussion of result; section five provide the hardware and software used for the experimentation and finally, section six gives the summary and conclusion of the work. # II. # Related Works Android malware detection systems available currently employs static approach to malware detection by scanning files for byte sequences of known malware Applications. Anomaly-based detection is still in a developmental stage and researches are ongoing. As a result, the current approaches are not able to detect unknown attacks. Unknown malware attacks also referred to as 'zero day attacks' are attacks carried out by unknown malware whose signatures have not been analysed and obtained. Several approaches with different metrics for defining Android application behaviours have been developed and are discussed. You Jounget al. (2014);You Joung&Hyung-Woo, (2014)presented an approach for determining malicious attack on Android using System Call Event Pattern Analysis. In their work, system calls invoked by executing Applications of different categories and their frequency of occurrences is used as the metrics for defining Applications behaviour. Their analysis was carried out on Linux system rather than on mobile device. Abelaet al.( 2013) developed AMDA an automated malware detection system for the Android platform. The core modules of the system included the Feature Extraction Module and the Behaviour Analysis Module. The Feature Extraction Module generates activity log from running applications retrieved from the application repository of the system. The activity log contains the system calls from application activity which are the features that the module retrieves. Mohammed et al. (2014) in the Automatic Feature Extraction part of their work proposed and implemented an approach to detect malicious applications statically through a set of well-defined APIs. Similarly, Tchakounté, &Dayang (2013) used a static approach to analyse System calls of malware on the Android platform. Lin et al, (2013)proposed SCSdroid, which uses the thread-grained system calls sequences, because these sequences can be regarded as the actual behaviour of the application. Their approach is a step further from just system calls of Applications to carter for malware repackaged applications. Luoxu & Qinghua, (2013) presented a static approach to their Runtime-based Behaviour Dynamic Analysis System for Android Malware Detection. They used Loadable Kernel Module hooking to hook the Android system and then collect data. The collected data consist of IMSI, SIM, IMEI, TEL, call log, SMS, MAIL and so on. The technology of analysis is semantic analysis and regular expression. Yousra, Wenliang&Heng,(2013) used APIs as the feature for describing Android behaviours used for detecting malware. To select the best features that distinguish between malware from benign applications, API level information within the bytecode were used since it conveys substantial semantics about the apps behaviour. More specifically, they focused on critical API calls, their package level information, as well as their parameters. Dini, Martinelli, Saracino&Sgandurra, (2012)employed two-layer applications behaviour features in order to properly described Android malware behaviours. These include System calls from the kernel layer and other features from the Applications layer. This approach tend to provide a better description of the system than a monolithic view of just a single layer as it considered both the Operating System layer behaviours and the Applications layer behaviours. It is observed from all the reviewed literatures that System calls pattern analysis played a critical role in providing Android Applications behaviour pattern. It is therefore clear that System calls as features could best be used either singly or in addition to other features to describe Application behaviours not just in Android but any mobile platform. In this section, the various activities carried out and the different modules implemented to ensure application feature behaviours are intercepted for use in malware detection process are discussed. But before then we show the big picture of the entire malware detection system in a schematic form as in Figure 1.0. Each of these Applications is executed in an instrumented Android emulator via Android Virtual Device (AVD). An Android 2.3.3 software development kit (SDK) emulator is used to run the Android applications because this is the only medium to automate the generation of application system activity logs without using an actual mobile device. There is no much actual difference to using human input to be able to activate the behavioural activity of an application. # III. # Experimental Procedures and Setup However, the log data contains activities which are irrelevant for detection of malicious activity. With this problem of noise in the log data, the system utilizes a self-developed parser which is customized as to which features are to be collected. # b) The Data Collection Processes In order to collect the Android Applications data, the various monitors described are implemented as Android java programs in the Device Monitoring Application. This application is actually just a module in the complete detection system called HOSBAD. The application will serve as the feature mining model which will run on the Android device to collect the features while the user interacts with Applications on the device. The feature mining model will monitor Android application activities implemented using a broadcast receiver and record on going activity taking place on the device. In order to apply any machine learning algorithm or classifier, it is fundamentally important to first and foremost collect relevant features. The features that Android as a system allows access permissions to depend on the type of device. The type of device here implies whether the device has been rooted or not. Android is based on the Linux kernel at the bottom layer, all layers on top of the kernel layer run without privileged mode. That is, all applications and system libraries are inside a virtual application sandbox. As a result of this architecture, applications are prohibited from accessing other application data (unless explicitly granted permission by other applications called the rooting applications). Thus, if a feature vector is created from features of Android API in unrooted mode, then only system information made available by Android can be used. On the other hand, having a rooted device allows one to install system tools that could gather features from underlying host and network behaviour but doing this subject the device to serious security vulnerabilities as the entire device file system will be opened up to attacks. In this Work, an unrooted device is used in order to collect Android application data. To be able to do this, a feature mining model which is a selfdeveloped application module that will be part of the detection system is used. This application is able to collect essential information from Android application such as installed applications and services running within the device before or after the Monitoring application was started, the date/time stamp, calls initiated from the device (outCalls), calls received by the device (InCalls), sent SMSs (OutSMS), SMSs received (InSMS), and the status of the device (Screen) as at when the event took place. This information is written into a log file and stored on the SDcard of the device. This log file is a comma separated value in .csv format. Parsing these data with another self-developed code module will produce the feature vectors which is in .arff file format; the format acceptable by WEKA. This selfdeveloped code module that serves as a feature mining model for application enable us to create a folder were all monitored/recorded application logs in csv file format will be stored. This csv file will be parsed by another parserto make feature vector file in arff. This arff file of feature vectors will be used as input to the Classifier in the Android malware detection system. # Figure 1. 3 : Features Extraction Processes The data extraction application performs the following major task as it runs either in foreground or background. This is represented in Figure 1 Secondly, the log stream is input to the parser in the Device Monitoring application and is parsed by filtering and formatting the log data to a readable form in a comma separated value (csv) format. iii. Finally, the csv file will then be parsed by another parser to generate a .arff file that will be used by the classifier. # i. Implementation Details Although the code for the Device Monitoring application which is the data extraction model cannot be given here, the skeletal description of the different modules representing the respective monitors is presented. The broadcast receiver class for the calls and receiving incoming SMS record the calls and SMS events into app preferences, there is no proper receiver for the outgoing SMS so special observer class is used in the service class. When this receiver is started in service, it doesn't work on real device, so it is registered The collector module in conjunction with the monitors will help to collect as much information as possible from the Android Applications installed on the device. This information include the Date/Time stamp, the application and services running on the device, outgoing calls, incoming calls, out-going SMS, incoming SMS, and Device screen status. This information is collectively referred to as feature of application or behaviours . For each .apk file, the device user interaction is created or the emulator simulates user interaction by randomly interacting with the application interface. It should be note that due to the numerous Android Applications available in the Android market, it is not possible for one to monitor and record all Applications for the numerous available Android Applications, doing this will require the researcher to spend many years collecting all of the information about Applications available in the Android market. For this reason, few of the Applications were selected. in the manifest and the preferences is used. The structure of the public class; ReceiverCallSms that implements the calls and the SMS is given as; Within this class, the methods for the calls (outgoing and in-coming calls) and the in-coming SMS are implemented in a single method with a nextedif ..else statement. The Inner broadcast receiver for monitoring the screen condition is implemented with the class ScreenReceiver which implements the onReceive method using special observer "intent". The service monitoring is implemented by a class Service Monitoring with a method that records the services running on the device and the features to be extracted. The Binder function initiates the monitoring process when the start button is clicked and to stop the monitoring when the stop button is clicked. All monitored events and activities are written to a file in a comma separated value format. The method checks for the presence of an SD card and create a folder there where the file will be stored or setup a Gmail account where the file will be sent to without user interference. The file is named using the device date/time stamp.csv. The settings menu provides the avenue for creating folder where reports will be stored on the SD card and to also specify a Gmail account and mail subject if the report is to be sent to a remote recipient or possibly server for analysis. # d) Feature Vectors Analysing activities of the system will give an accurate representation of the behaviour of the applications. The aim of intercepting these activities is to create an output file containing the events generated by the Android applications. This file provides useful information such as opened and accessed applications, running applications, running services, timestamps, received SMSs, sent SMSs, calls received, calls initiated and device status as at the time of occurrence of the activity. This information generated by the Device Monitoring application is used to represent the behaviour of applications. IV. # Discussion of Result A sample report obtained from a single run of the feature extraction model implemented as a Device Monitoring application is given and discussed here. Time,AppName,OutCall,InCall,OutSMS,InSMS,Screen,Class before,YouTube,0,0,0,0,1,? before,Launcher,0,0,0,0,1,? before,Torch,0,0,0,0,1,? before,Opera Mini beta,0,0,0,0,1,? before,Contacts,0,0,0,0,1,? before,Phone,0,0,0,0,1,? 07.10.2015 21:17:00,Monitoring Stopped Tally:,,out calls: 1,in calls: 1,out sms: 0,in sms: 1 The report shows the date and time the Monitoring Device application was started. Immediately after that line is the field or attributes of the collected information in a CSV manner. After the attributes are the attribute values entered in the order of the specified attributes. The first attribute is the Date/Time, followed by AppName, OutCall, InCall, OutSMS, InSMS, Screen, and finally the Class in that order. For applications and services running before the Monitoring Device application was started, the Date/Time stamp is indicated as "before" while the applications and services started after the Monitoring Device application was started, the date/time stamp is indicated. It is indeed very difficult to know which application performs a given activity since certain tasks are deprecated at application layer. Therefore, any activity that occurred without knowing which application perform the activity is given '?' as the value for the AppName attribute at that point. For the OutCall, InCall, OutSMS, InSMS and Screen attribute, the attributes have Boolean values; the value 0 is entered to represent the absence of the attribute and 1 is entered to represent the presence of that attribute. For the Screen attribute that represents the device status which is either idle or active, the value 1 means that the screen is in 'ON' or active state while 0 imply 'OFF' or idle state. Finally, the last attribute Class is not actually extracted from the applications or services by the Device Monitoring application but appended to the log file to indicate the class after classification is done using the classifier. Since the classification has not yet beencarried out on the data, the classes of the instances are undetermined and so they all have the value of '?' that means unknown class (normal or malicious). When the Device Monitoring application is stopped, the event together with the Date/Time stamp of the event is registered and finally the report gives a summary of all the events in the form of count or tally. V. # Hardware and Software The experiments were run on a laptop machine with the Intel Core-i3 -370M Processor, 3GBof available memory and 500GB Hard Disk Drive (HDD). This machine runs Windows 7 Operating System while Android Studio 1.2.2 Integrated Development Environment (IDE) was used as the Software Development Kit (SDK). # VI. # Summary and Conclusion In this paper, we describe the development of a feature extraction model that is used to extract Android application behaviour for anomaly malware detection. The type of information that can be extracted depends on whether the device has been rooted or not. Our focus is on unrooted Android devices and the information that were extracted and used to describe Android application behaviours include date/time stamp of the running application and services given as Time, Application and service name (AppName), Outbound call (OutCall), Inbound call (InCall), Outbound SMS (OutSMS), Inbound SMS (InSMS) and the device status (Screen). The device status indicates whether there is an active interaction with the device by the user or not. When the screen is active (value of 1), it means there is active interaction with the device by the user and when the screen is idle or hibernated, it implies no active user interaction. Activities like sending SMS and initiating calls requires active user interaction. If these attributes have values of 1 when the screen state is idle (value of 0) implies a suspicious or malicious behaviour is taking place on the device by an application. Although other features could be added, these were used as a test base to realise the concept of anomaly detection system. As earlier stated, the type of information that can be intercepted depends on whether the device is rooted or not. Rooting a device is a bridge of security and therefore opens up the device to attacks. Since the aim is to improve security of mobile devices and applications with Android platform, an unrooted device is used. To be able to access more information that could be used to describe application behaviour for anomaly detection purposes, it is recommended that access to certain information like system calls, network traffic etc. which are presently deprecated in unrooted Android systems should be allowed access by Google in some ways. 1![Extracting Android Applications Data for Anomaly-based Malware Detection © 2015 Global Journals Inc. (US) Global Journal of C omp uter S cience and T echnology Volume XV Issue V Version I ( ) E Year 2015](image-2.png "1 2") 10![Figure 1. 0 : Architecture of the Android Malware Detection System (HOSBAD) a) Application Acquisition ProcessThe Application Acquisition process involves downloading applications from Android Markets and storing them into the application repository folder. Applications which could be normal or malicious are downloaded both from the Official Android market and unofficial Android markets. Figure1.1 shows the Application acquisition processes.](image-3.png "Figure 1 . 0 :") 131112![Figure 1. 1: Schematic of Application Acquisition Process](image-4.png "Figure 1 . 3 GlobalFigure 1 . 1 :Figure 1 . 2 :") 1![Figure 1.4 shows a screenshot of the feature mining model application for the malware detection system.](image-5.png "Figure 1 .") 14![Figure 1.4 : Feature Mining Model Application](image-6.png "Figure 1 . 4 :") ![Journal of C omp uter S cience and T echnology Volume XV Issue V Version I ( ) E Year 2015 public class ReceiverCallSms extends BroadcastReceiver { 07.10.2015 20:33:55,Monitoring Started](image-7.png "Global") © 2015 Global Journals Inc. (US) * 20:36:47,com.facebook.fbservice.service 07.10.2015 21:13:10 before,com.mediatek.filemanager.service.FileManagerService com.facebook.conditionalworker Torch,0,0,0,0,1 Facebook,0,0,0,0,1,? before,Messages,0,0,0,0,1. CellConnService.PhoneStatesMgrService,0,0,0,0,1. 0,0,0,0,1. ,0,0,0,0,1. ,0,0,0,0,1. LaunchService,0,0,0,0,1. ,0,0,0,0,1. BackgroudCheckService,0,0,0,0,1. 2015. DefaultBlueService,0,0,0,0,1. 2015. ConditionalWorkerService,0,0,0,0,1. 2015. VaultManagerService,0,0,0,0,1. 2015. AnalyticsService,0,0,0,0,1. 2015. DefaultBlueService,0,0,0,0,1. ConditionalWorkerService,0,0,0,0,1. ConditionalWorkerService,0,0,0,0,1. Launcher,0,0,0,0,1. YouTube,0,0,0,0,1. Launcher,0,0,0,0,1. Google Play Store,0,0,0,0,1. Launcher,0,0,0,0,1. Gallery,0,0,0,0,1. Launcher,0,0,0,0,1. Launcher,0,0,0,0,1. Contacts,0,0,0,0,1. 1,0,0,0,1. Phone,0,0,0,0,1. 2015 20:49:27,Contacts,0,0,0,0,1. Launcher,0,0,0,0,1. Email,0,0,0,0,1. Launcher,0,0,0,0,1. SendSMS,0,0,0,0,1. 2015 20:51:20,?,0,0,0,1,0. ConditionalWorkerService,0,0,0,0,1. ConditionalWorkerService,0,0,0,0,1. 0,1,0,0,1. Phone,0,0,0,0,1. ConditionalWorkerService,0,0,0,0,1. Launcher,0,0,0,0,1. ConditionalWorkerService,0,0,0,0,1. WhatsApp,0,0,0,0,1. Tecno Input,0,0,0,0,1 20 com.facebook.conditionalworker. com.facebook.conditionalworker.ConditionalWorkerService,0,0,0,0,1. com.facebook.conditionalworker.ConditionalWorkerService,0,0,0,0,1. WhatsApp,0,0,0,0,1,? 07.10.2015 21:16:12,Launcher,0,0,0,0,1,? 07.10.2015 21:16:28,SendSMS,0,0,0,0,1,? 07.10.2015 21:16:53,Launcher,0,0,0,0,1,? 07.10.2015 21:16:59,Device Monitoring,0,0,0,0,1,? References Références Referencias * An Automated Malware Detection System for Android using Behavior-based Analysis AMDA AbelaKevin JoshuaAngelesl DonKristopher EDelas Alas JanRaynier PTolentino RobertJoseph MiguelGomez NAlberto International Journal of Cyber-Security and Digital Forensics 2 2 2013 The Society of Digital Information and Wireless Communications IJCSDF) * Detection of Malicious Android Mobile Applications Based on Aggregated System Call Events JoungYou Hyung-WooHam Lee International Journal of Computer and Communication Engineering 3 2 2014. March 2014 * Vulnerability monitoring mechanism in Android based smartphone with correlation analysis on event-driven activities YJHam WBChoi HWLee JDLim JNKim nd International through Manifest and API Calls Tracing, 7 th Asia Joint Conference on Information Security 2012. 2012 2 * Android Malware Detection via a Latent Network Behaviour Analysis TEWei CHMao ABJeng HMLee HTWang DJWu IEEE 11 th International Conference on Trust, Security and Privacy in Computing and Communications 2012 * Crowdroid: behavior-based malware detection system for Android IBurquera UZurutuza SNadjm-Tehrani Proceedings of the 1 st ACM workshop on Security and privacy in smartphones and mobile devices the 1 st ACM workshop on Security and privacy in smartphones and mobile devices 2011 * Android Mobile Application System Call Event Pattern Analysis for Determination of Malicious Attack You JoungHam DaeyeolMoon Hyung-WooLee JaeDeokLim Jeong NyeoKim 10.14257/ijsia.2014.8.1.22 International Journal of Security and Its Applications 8 1 2014 * Google Play Matches Apple's iOS With 700,000 Apps. Businessweek IslamZack 2012. 30 October 2012 * MADAM: A Multi-level Anomaly Detector for Android Malware GDini FMartinelli ASaracino ASgandurra Computer Network Security 7531 2012 * WenliangYousraaafer HengDu Yin 2013 * Mining API-Level Features for Robust Malware Detection in Android DroidApiminer * Runtime-based Behaviour Dynamic Analysis System for Android Malware Detection LuoxuMin QinghuaCao 2013 * Identifying Android Malicious Repackaged Applications by Threadgrained System call Sequences Ying-DarLin Yuan-ChengLai Chien-HungChen Hao-ChuanTsai 10.1016/j.cose.2013.08.010 2013. 2013 Elsevier:Computers & Security * LoviDua DivyaBansal Taxonomy: Mobile MalwareThreats and Detection Techniques 2014 * Acity Wimon AiaaCsia Dppr InNeco Wes Dhinaharan Nagamalai et al. 2014 * An Analysis of Mobile Malware and Detection Techniques AswathyDinesh 2013 * System Calls Analysis of Malwares on Android FTchakounté PDayang International Journal of Science and Technology 2 9 2013 * Automatic Feature Extraction, Categorization and Detection of Malicious Code in Android Applications MuhammadZuhairqadir AtifnisarJilani Hassam UllahSheikh International Journal of Information & Network Security (IJINS) 3 1 2014. February 2014 * Mobile Malware Evolution, Detection and Defense. Unpublished Term Survey Paper, Institute for Computing, Information and Cognitive Systems RSrikanth 2012 Vancouver, Canada University of British Columbia