# I. Introduction ata cleaning is a step for discovery of database. Data cleaning, it is also known as data cleansing, it is a phase in which noisy data, anomalous data and irrelevant data are removed from the collection of various data. Missing data are defined as some of the values in the data set which are either lost or not observed or not available due to natural or non natural reasons. Data with missing values confuses both the data analysis and the submission of a solution to fresh data. Thus, three main problems arise when dealing with incomplete data. First, there is a loss of information and, as a consequence, a loss of efficiency. Second, there are several complications related to data handling, computation and analysis, due to the irregulaties in data structure and the impossibility of using standard software. Third, and most important, there may be bias due to systematic differences between observed and unobserved data. Deal with missing data is major task for cleaning data. Noor et all [1] In this paper, three types of mean imputation techniques introduced on missing data. Rubin [7] explored about inference and missing data and multiple imputations for non-response in the survey. Allison [8] investigated estimates of linear models with incomplete data and on missing data. Smyth [9] and Zhang [10] have considered that data preparation is a fundamental stage of data analysis. Therefore, this research focuses on anomalous and missing data values. In our research we create a novel method to replace the missing values. # II. Missing Data Methods There are several methods for treating missing data. Missing data treatment methods can be divided into three categories, as proposed in [7]. # a) Ignoring and discarding data In this method the two main ways to discard data with missing values. The first method is known as list wise deletion. It consists of discarding all instances with missing data. The second method is known as pair wise deletion method. It consists of discarding instances or attributes before deleting any attribute, it is necessary to evaluate its relevance to the analysis. # b) Parameter estimation In this missing data treatment method, Maximum likelihood procedures that use variants of the Expectation-Maximization algorithm can handle parameter estimation in the presence of missing data. # c) Imputation Imputation method is a class of procedures that aims to fill in the missing values with estimated ones. The objective is to employ known relationships that can be identified in the valid values of the data set assist in estimating the missing values [9]. Mean Above Below Method: [1] this method replaces all missing values with the mean of the data above the missing value and one data below the missing value. Mean Above Method [1]: This method replaces all missing values with the mean of all available data above the missing values. Mean Method [1]: This method replaces all missing values with the mean of all available data. As per the figure 3 , the missing value case is by the subscript of the attribute and denoted by the variable x i. after pointing missing value case, we have to record the three upper value(x i-1 ,x i-2 ,x i-3 ) and three lower value(x i+1 ,x i+2 ,x i+3 ) from the missing value subscripts. Now the anomalous value in this subset is detected by the percentage change formula. After computing the percentage change of the subset. Now, we find the outlier range, value of outlier range define as per the suitable of the array value. If the anomalous value is detected in the data set, remove that value from The proposed Methodology works in two stages. The first stage is localizing missing data and remove anomalous data, in next stage we substitute the estimated value in the place of missing values by using proposed method. This calculation gives the effective result and decreases the biasness of result. The working stream of proposed work is shown in figure 2, if there are missing values in the raw data set, then the small subset/array is created from the input data sheet in which missing data value is existing, along with this we work out for anomalous value, according if anomalous value is presented, replace anomalous value with the new calculated value, last step of the work is estimation of missing values using Euclidean distance. 2 Here, X a is centroid of the array Y a is particular value of the array at the last we compute the average of the Euclidean distance and add centroid with average value of distance this is the estimated value of missing value. The value of X est (estimated value) is separately computed for every missing value in the complete datasheet. # IV. Results and Discussion Our experiments were carried out for time series datasets taken from Earthpolicy. Below graph figure 4 shows comparison with respect to mean of all method. The U.S. Motor Gasoline Consumption respectively for the years 1950-2014 for million barrels attribute. The mean consumption of u.s. motor gasoline of million barrels are 2714.the variables are observed and missing values it may be noted that in the planned way 20% values are missing in the random manner for all the variables and in this dataset value of outliers is greater than 5.The mean calculated from incomplete data sets are 2379 this value is slightly lower than the mean values. The proposed methodology applied on the data sets to fill up missing values and the value is 2714.It is observed that the mean values are obtained after replacing missing values by proposed work are close to the actual mean. The results from the proposed method are compares with the techniques like mean above below(MAB), mean above(MA),mean imputation(MI), mean comparison method proposed by Noor et all [1] and analyze shows that proposed method value substitute missing values are more close to the original method with respect to the other method. In figure 5 & figure 6 shows the comparison with respect to standard deviation value and coefficient of variance value of all methods. The proposed method performed significantly better than all other methods. The data sheets are imported in the SPSS and necessary tests for the data validation and significance were applied. On the SPSS software the results are checked by using the ANOVA test for the data sheet and significance value is 99.1% that shows the result is efficient and more compatible with original data. # V. Conclusion The work focuses on imputing missing values using proposed methodology for numerical attribute in time series data sheet. This method is suitable to handling missing data alone in presence of anomalous data. In this work, performance of proposed method is more reliable as comparing to other mean imputation technique for data analysis in the data mining field. 1![Figure 1: Types of mean imputation method Mean Imputation Method: In this technique, It consists of replacing the missing data for a given feature by the mean of all known values of that attribute in the class where the instance with missing attribute belongs mean of each attribute that contains missing values is calculated and is replaced in the place of missing values. Each missing value is substituted with calculated mean value which is same for all.](image-2.png "Figure 1 :") 2![Figure 2: Design flow of proposed methodology III. Proposed Method for Inference of Missing Attributes Value in Data Mining](image-3.png "Figure 2 :") 3![Figure 3: Calculation of percentage change for outlier detection b) Calculation for missing values Estimation of missing values in the last phase, when we have the outlier-free data. we process to fill missing values in this array, firstly calculate centroid of the subset ,centroid is generated by the mean of subset. At the further stage Euclidean distance is calculated between centroid of the data and the each value of the](image-4.png "Figure 3 :") ![org site. In proposed work we used different Datasheet like Hydroelectric Generation in India 1965-2013, Average Global Temperature 1880-2014, U.S. Motor Gasoline Consumption 1950-2014, World Wood Production 1961-2011 and few more. Here, we evaluate the U.S. Motor Gasoline Consumption 1950-2014 contains 50 number of instances and two attributes.](image-5.png "") U.S. Motor Gasoline Consumption, 1950-201425value20consumption5 10 151720 17 171617 17cv0Year 2016methods76Volume XVI Issue V Version I)(Global Journal of Computer Science and Technology © 2016 Global Journals Inc. (US) * Mean imputation techniques for filling the missing observations in air pollution dataset MNNoor ASYahaya NARamli AM MBakri Key Engineering Materials 2014 * Outlier detection and missing data filling methods for coastal water temperature data HYCho JHOh KOKim JSShim Journal of Coastal Research 65 2013 * imputing large group averages for missing data, 4. using rural-urban continuum codes for density driven industry sectors JRPorter RECossman WLJames Journal of Population Research 26 3 2009 * Missing data analysis: Making it work in the real world. Annual review of psychology JWGraham 2009 60 * Outliers in Statistical Data VBarnett TLewis 1994 John Wiley and Sons New York * Copy mean: a new method to impute intermittent missing values in longitudinal studies CGenolini HJacqmin-Gadda Open Journal of Statistics 3 2013 * Inference and missing data DBRubin Biometrika 63 3 1976 * Estimation of linear models with incomplete data PDAllison Sociological methodology 1987 * Data mining at the interface of computer science and statistics PSmyth Data mining for scientific and engineering applications 35-61. Springer US 2001 * Data prepara tion for data mining SZhang CZhang QYang Applied Artificial Intelligence 17 5-6 2003 * Multiple imputation for missing ordinal data LChen MToma-Drane RFValois JWDrane Journal of Modern Applied Statistical Methods 4 1 26 2005 * Missing value estimation for mixed-attribute data sets XZhu SZhang ZJin ZZhang ZXu IEEE Transactions on Knowledge and Data 2011