# Introduction

uman health is main part of the life. The meaning of health has evolved over time. From the biomedical perspective, early definitions of health focused on the issue of the body's ability to function; health was viewed as a normal functioning state which could be interrupted from time to time by illness. Also, in current era, people too much busy with their works. Because of that, people are suffering with noncommunicable chronic diseases [7]. According to the Pharmaceutical Journal of Sri Lanka 2016 [7], there are Hypertension 48.5%, Diabetes mellitus 45.3% and Ischemic Heart disease 29.4%. Proposed application based on non-communicable chronic diseases. First part mainly based on heart diseases. Because the heart diseases area is wide in the medical world. There are many categories based on heart diseases named blood vessel diseases, coronary artery disease; heart rhythm problems (arrhythmias); and heart defects born with (congenital heart defects), among others [8]. Also, Cardiovascular disease usually refers to conditions that involve narrowed or blocked blood vessels that can lead to a heart attack, chest pain (angina), or a stroke. Other heart conditions, such as those that affect muscle, valves, or heart rate, are also considered to be forms of heart disease. Other part based on cholesterol, diabetic and blood pressure. Cholesterol is a chemical compound that the body needs as a building block for cell membranes and hormones like estragon and testosterone. The liver produces about 80% of the body's cholesterol and the rest comes from food sources such as meat, chicken, eggs, fish, and dairy products. Plant-based foods do not contain cholesterol [9]. Cholesterol divided in to three parts known as Highdensity lipoprotein (HDL), Low-density lipoprotein (LDL) and very low-density lipoprotein(VLDL). There are levels known as VLDL < 40 mg/dL, LDL < 160 mg/dL, HDL >= 45mg/dL, Triglycerides< 150 mg/dL.


# II.


# Method a) Data sets i. Heart disease prediction model

The data set that we used in this research, is used various researchers for their research purpose. We get it the web site called kaggle.com [10].This dataset was used in this research designing for heart disease diagnosis for machine-learning-based system. This heart disease related dataset has a sample size of 4240 patients, 16 features.

ii


# . Prediction for future cholesterol level model

The data set that we used to predict cholesterol level is a created data set by me. This data set used in this research for predict cholesterol level for future six months suing time series analysis. This cholesterol level related dataset has a sample size of 20 months of one patient.

iii. Model for diet plan I created dataset to similar medicine using SPC guidelines. Because there is no dummy, data set to this part. In that case, I met a doctor and created a sample dataset to create the model.

To predict the diabetic level and cholesterol levels part we got the dummy data using https://www.kaggle.com/ website.


# iv. Give an idea of the prescription

Collected cholesterol prescription as an image data set and created a stranded dataset.


# b) Data processing

There is feature called education. That one is not related to the heart disease. Therefore, we drop that feature. During the cleaning, remove null values. Some null values fill with the mean value of the feature and get a value, which will increase the efficiency. In the prediction of the diabetic and cholesterol level part, started to create the model using jupyter notebook. First, imported necessary libraries and added the dataset. Also dropped the unnecessary data column in the data set.


# c) Methodology of the Proposed System i. Heart disease prediction model

The proposed system developing with the aim to classify weather people should channel cardiology or not. One of the popular machine learning classifiers logistic regression used for classification of this system. Logistic regression is the one of best classifier to get binary value output. The methodology of the proposed system structured into four stages including (1) preprocessing of dataset, (2) feature selection, (3) machine learning classifiers, and (4) classifiers' performance evaluation methods. Figure 1 shows the framework of the proposed system. In order to classify two classes 0 and 1, a hypothesis will be designed and threshold classifier output is at 0.5. If the value of hypothesis, it will predict y = 1 which mean that the person has heart disease and if value of, then predict y = 0 which shows that the person is healthy.

Hence, the prediction of logistic regression under the condition is done.

My ratio is 80-20. 80% data will train and 20% will be test. Import confusion matrix to represent the false positive, false negative, true positive and true negative.


# ii. Prediction for future cholesterol level model

This proposed system developing to predict a cholesterol level for about 6 months of future and store patient past data records of cholesterol level. The time series is one of the popular machine learning prediction algorithms. In time series analysis have one variable at that time. There have an independent variable and a dependent variable. Time series prediction is a form of data mining that predicts future behaviors by analyzing historical data.

( )
H Year 2021
The objectives of a time series prediction ,t is estimated value of x and ,X[t+s]=f(x[t],x[t-1],?..,x[t-N]), s>0 is called the horizon of prediction. Figure 2 shows the prediction of a time series using auto regression integrated moving average (ARIMA-model) [11].


# Figure 2

The way a Simply Moving Average is calculated is that it takes the subset of the data mentioned in the moving average model description, adds together the data points, and then takes the average over the subset of data. It can help identify the direction of trends in your data and identify levels of resistance wherein business or trading data. [12] Forecasting is one of the most relevant tasks when working with time-series data. You can forecast with a simple moving average, another moving average model called 'Autoregressive Integrated Moving Average' is popular for fairly accurate and quick forecasting of time series. The Autoregressive Integrated Moving Average, or ARIMA model, is a linear function that is used for predicting future data points based on past data. ARIMA combines the models the past data points to determine future points to the linear regression model on an independent variable to predict the dependent variable. Because of ARIMA's using past data, a longer series is preferable to get results that are more accurate. [13] iii. Model for diet plan

In decision tree classification data model have two main types known as classification tree and regression tree. This is a non-parametric supervised learning method [1]. In this data model, predict the value of the target variable in the data set by learning simple decision rules. In classification, tree outcome was yes/no type. Those decision variables are categorical or discrete. Also, it known as binary recursive partitioning. However, regression tree is taking continuous values or real numbers [2]. There are many different algorithms but in here, mainly used ID3 (Iterative Dichotomies 3) algorithm [3] invented by Ross Quinlan. Simple meaning of this is greedy search via the space of possible branches without no reverse. It is built top-down from a root node and create subsets using similar values. This is known as homogenous [4]. ID3 algorithm used entropy (figure 3) to appraise the homogeneity of the data set.


# Figure 3

Entropy using the frequency table to calculate the decision tree within two types. First, one is for one attribute (Figure 4). Second, one is for two attributes (Figure 5). There are two formulas for above mention types.  


# iv. Give an idea of the prescription

First recognized of handwritten medical forms. For that, I used Lexicon Driven Word Recognizer Algorithm [5]. All lexicon entries are treated as detached words and matched the input word image as containing handwriting to recognize in word model-based recognition. Lexicon entry is the best top choice of this. To develop this model, we created word recognition methodology (Figure 6). Segments are matched against individual characters without using any contextual information in character model-based recognition. In addition, we used Latent Semantic Analysis [6] to compute the relationship between the context of words and terms to a semantic category.   For the cholesterol prediction section, we used the performance of time series analysis to predict future cholesterol levels. The result of selecting time series analysis, inputs of cholesterol levels are used to convert to log scale and giving a plot graph and showing prediction line for six months. Other than that in that graph shows the confidence level of prediction In the cholesterol level and diabetic level, get the 90% as accuracy score. Also, predict the result of the data set. As we expected, predict data was the same as the actual data (Figure7).


# Figure 10 d) Give an idea of the prescription

In the handwriting recognition of the prescription part, get an example to give the result.

Selected a random word as 'word'. Matched the word between a sample image and lexicon entry 'word' (Figure 8).


# Figure 11

In the first part, it shows segment point of the image. There are 9 points. Second part is the confidence of the match word. Final part is matching the paths and confidences. For this result will be as the expected one. classification. We will perform more experiments to increase the performance of these predictive classifiers for heart disease prediction by using others feature selection algorithms and optimization techniques. If someone follow the heart disease prediction, you all can use different data set and can be use other algorithms for classification. If researchers can implement hybrid model using many algorithms. We think it is also new era of this heart disease prediction model.

In cholesterol level prediction section, we try to implement a model of predict cholesterol level for about six months for future. We train and test model for predict cholesterol level using given dataset and get prediction line and confidence area.

Researchers can develop this model for other diseases and they can try to develop this system using other algorithms and techniques. If someone trying to follow cholesterol level prediction, you can try to get a 


# Conclusion

In this research, we try to implement a model for predict heart disease; predict cholesterol and diabetic levels for best meal plan using machine learning algorithm. We train and test model for using given data set. In predict heart disease, part, its accuracy score is 87% up to now. For that, we used logistic regression for classification. In predict cholesterol and diabetic levels for best meal plan part, its accuracy score is 90%. For that, we used Decision tree for classification.

Researchers can increase accuracy level of the model. However, there are numbers of algorithms to very smooth line using another way for stationarity. If researchers implement this using more algorithm, that also a new thing for cholesterol prediction model.

In prescription reading via image, processing is a big challenge for us. However, using Lexicon Driven Word Recognizer Algorithm, it simplifies the model work. Use of variable duration in word recognition process improved performance.
1![Figure 1 Logistic Regression: A logistic regression is a classification algorithm [27-29]. For binary classification problem, in order to predict the value of predictive variable y when y ? [0, 1], 0 is negative class and 1 is positive class. It also uses multi classification to predict the value of y when y ? [0, 1, 2, 3].In order to classify two classes 0 and 1, a hypothesis will be designed and threshold classifier](image-2.png "Figure 1 Logistic")
4![Figure 4: Calculation for one attribute](image-3.png "Figure 4 :")
5![Figure 5: Calculation for two attributes In the similar medicine part, created a histogram to overview the dataset. Finally checked the accuracy of the dataset. In the cholesterol and diabetic part, create the ranges to diabetic level and cholesterol level. Then created a pie chart to get an idea of the data set. Checked the null values in the dataset. If there were null values, get the mean value of the data column and replaced to the null values. Next, create a histogram to the age column and reading levels. Finally checked the accuracy of the dataset. Next created the meal plan to the reading ranges above mentioned.](image-4.png "Figure 5 :")
![Figure 6](image-5.png "")
7![Figure 7 I used grid search for increase my accuracy after the training and testing the data. Accuracy score increased by 0.1%.](image-6.png "Figure 7 I")
8![Figure 8 b) Prediction for future cholesterol level modelFor the cholesterol prediction section, we used the performance of time series analysis to predict future cholesterol levels. The result of selecting time series](image-7.png "Figure 8 b")
9![Figure 9 c) Model for diet planIn the cholesterol level and diabetic level, get the 90% as accuracy score. Also, predict the result of the data set. As we expected, predict data was the same as the actual data (Figure7).](image-8.png "Figure 9 c")
![](image-9.png "")
			( ) H © 2021 Global Journals Year 2021
		
		
## Acknowledgement

The Sri Lanka Institute of Information Technology supported for this work.

			
* 
	
		Available at: https://machinele arningmastery.com/parametric-and-nonparametr icmachine-learningalgorithms/#:~:text=under lying% 20mapping%20function
		
			JBrownlee
		
	
		Machine Learning Mastery
				
			2016. Nonparametric%20Mach ine%20Learning%20Algorithms
			20
			
		
	Parametric and Nonparametric Machine Learning Algorithms


* 
	
		Decision Tree Classification
		
			AChakure
		
		
			2019
		
	
* 
	
		Decision Tree
		
			Saedsayad
		
		
		#:~:text=Decision%20Tree%20%2D%20Cl assification
				
	
	decision%20nodes%20and%20leaf%20 nodes


* 
	
		A Lexicon Driven Approach to Handwritten Word Recognition for Real-Time Applications
		
			SemanticscholarPdfs
		
		
			Org
		
		
			1997
		
	
* 
	
		Handbook Of Latent Semantic Analysis
		
		
			2011
		
	
	Google Books


* 
	
		Adverse drug reactions and associated factors in a cohort of Sri Lankan patient with non-communicable chronic diseases
		
			N
		
		
			WijekoonandLShanika
		
		
			2016
		
	
	ResearchGate.net


* 
	
		Heart disease -Symptoms and causes
		
	
		Mayo Clinic staff
		
			2018
		
	
	Mayo Clinic


* 
	
		What Is Cholesterol? HDL and LDL Ranges and Diet
		
			R
		
		
			CharlesPatrickDavis
		
		
			BenjaminWedro
		
		
		Medicine Net
		
			2016
		
	
* 
	
		Framingham Heart Study Dataset
		
			Kaggle
		
		
			Com
		
		
			2020. July 2020
		
	
* 
	
		Grammar-Based Feature Generation for Time-Series Prediction
		
			AMSilva
		
		
			PH WLeong
		
		
			2015
			Springer
			Berlin, Germany