# Introduction n the era of Machine learning, performance (based on accuracy and computing time) is very important. The growing number of tuning parameters associated with the Machine learning models is tedious and timeconsuming to set by standard optimization techniques. Researchers working with ML models often spend long hours to find the perfect combination of hyperparameters [1]. If we think w, x, y, z as the parameters of the model, and if all of these parameters are integers ranging from 0.0001 to say 5.00, then hyperparameter tuning is the finding the best combinations to make the objective function optimal. One of the major difficulties in working with the Machine learning problem is tuning hyperparameters. These are the design parameters that could directly affect the training outcome. The conversion from a nontuned Machine learning model to a tuned ML model is like learning to predict everything accurately from predicting nothing correctly [2]. There are two types of parameters in ML models: Hyperparameters, and Model parameters. Hyperparameters are arbitrarily set by the user even before starting to train the model, whereas, the model parameters are learned during the training. The quality of a predictive model mostly depends on the configuration of its hyperparameters, but it is often difficult to know how these hyperparameters interact with each other to affect the final results of the model [14]. To determine accuracy and make a comparison between two models it is always better to make comparisons between two models with both of the models' parameters tuned. It would be unfair to compare a Decision Tree model with the best parameter against an ANN model whose hyperparameters haven't been optimized yet. # II. # Literature Review The hyperparameter tuning, due to its importance, has changed to a new interesting topic in the ML community. The hyperparameter tuning algorithms are either model-free or model-based. Model-free algorithms are free of using knowledge about the solution space extracted during the optimization; a few of this category includes manual search [4], random search [2,[6][7], and grid search [5]. In the Manual search categories, we assume the values of the parameters based on our previous experience. In this technique, the user allows some sets of hyperparameters based on judgments or previous experience, trains the algorithm by them, observes the performance, keeps repeating to train the model until achieving a satisfactory accuracy and then selects the best set of hyperparameters that gives the maximum accuracy. However, this technique is heavily dependent on the judgment and previous expertise and its reliability is dependent on the correctness of the previous knowledge [3]. Some of the few of the main parameters used by Random forest classifiers are: criterion, max_depth, n_estimators, min_samples_split etc. In the Random search, we train and test our model based on some random combinations of the hyperparameters. This method is better used to identify new combinations of the parameters or to discover new hyperparameters. Although it may take more time to process, it often leads to better performance. Bergstra et al. (2012) in their work mentioned that, over the same domain, random search is able to find models that are as good as or even better in a reduced computation time. After granting the same computational budget for the random search, it was evident that random search can find better models by effectively searching for larger and less promising configuration spaces [16]. Random Search, which is developed based on grid research, sets up a grid of hyper-parameter values and selects random combinations to train the algorithm; [2]. In the grid search, the user sets a grid of hyperparameters and trains the model based on each possible combination. Amirabadi et al. (2020) proposes two novel suboptimal grid search techniques on the four separate dataset to show the efficiency of their hyperparameter tuning model and later compare it with some of the other recently published work. The main drawback of the grid search method is its high complexity. It is commonly used when there are a few numbers of hyperparameters to be tuned. In other words, grid search works well when the best combinations are already determined. Some of the similar works of grid search applications have been reported by Zhang et al. (2014) [17], Ghawi et al. (2019) [18], and Beyramysoltan et al. (2013) [19]. Zhang et al. (2019) [20] in their work reported a few of the drawbacks of the existing hyperparameter tuning methods. In their work, they mentioned grid search as an ad-hoc process, as it traverses all the possible combinations, and the entire procedure requires a lot of time. Andradóttir (2014) [13] shows that Random Search (RS) eradicates some of the limitations of the grid search technique to an extent. RS can reduce the overall time consumption, but the main disadvantage is that it cannot converge to the global optimal value. The combination of randomly selected hyperparameters can never guarantee a steady and widely acceptable result. That's why, apart from the manually tuning methods, automated tuning methods are becoming more and more popular in recent times; snoek et al. (2015) [10]. Bayesian Optimization is one of the most widely used automated hyperparameter tuning methods to find the global optimum in fewer steps. However, Bayesian optimization's results are sensitive to the parameters of the surrogate model and the accuracy is greatly depending on the quality of the learning model; Amirabadi et al. ( 2020) [3]. To minimize the error function of hyperparameter values, Bayesian optimization adopts probabilistic surrogate models like Gaussian processes. Through precise exploration and development, an alternative model of hyperparameter space is established; Eggensperger et al. (2013) [8]. However, probabilistic surrogates need accurate estimations of sufficient statistics of error function distribution. So, a sizable number of hyperparameters is required to evaluate the estimations and this method doesn't work well when there is to process myriad hyperparameters altogether. # III. # Methodology a) Dataset description Denier: Denier is a weight measurement usually refers to the thickness of the threads. It is the weight (grams) of a single optical fiber for 9 kilometers. If we have a 9 km fiber weighs 1 gram, this fiber has a denier of 1, or 1D. A fiber with less than 1 gram weight calls Microfibers [22]. Microfibers become a new development trend in the synthetic polymer industry. The higher the denier is, the more thick and strong the fiber is. Conversely, less denier means that the fiber/fabric will be softer and more transparent. Fine denier fibers are becoming a new standard and are very useful for the development of new textiles with excellent performance [21]. Breaking Elongation (%): Elongation at break is one of the few main quality parameters of any synthetic fiber [24]. It is the percentage of elongation at break. Fiber elongation partly reflects the extent of stretching a filament under a certain loading condition. Fibers with high elongation at break are determined to be easily stretched under a predetermined load. Fibers showing these characteristics are known to be flexible. The elongation behavior of any single fiber can be complex because of its multiplicity of structural factors affecting it. Moreover, a cotton fiber comes up with a natural crimp, which is important for fibers to stick together while undergoing other production processes [23]. If L is the length of the fiber, then the equation for the percentages of the breaking elongation would be: ?? ???????????????? ???????????????????? = ??? ?????????? ?? 0 * 100% Breaking elongation for the cotton fiber might be varied from 5% to 10%, which is significantly lower than that of wool fibers (25%-45%), and much lower than polyester fibers (typically over 50%). Breaking force (cN) and Tenacity (cN/tex): Breaking tenacity is the maximum load that a single fiber can withstand before breaking. For the Polypropylene and PET staple fibers, 10 mm lengths sample filaments is drawn until failure. Breaking tenacity is measured in grams/denier. Very small forces are encountered when evaluating fiber properties, so an instrument with gramlevel accuracy is required [25]. The tenacity of virgin PP fibers is about 5-8 g/den, and the elongation at break is about 100%. At the same time, the tenacity of recycled PET is about 3.5-5.7 g/den; the elongation at break usually exceeds 100%. Draw Ratio: The drawing ration is the ratio of the diameter of the initial blank form to the diameter of the drawn part. The limiting drawing ratio (Capstan speed/Nip reel speed) for the extruder section is between 1.6 and 2.2 [26], whereas, for the stretching section it is in between 3 and 4. # b) Hyper-parameter Optimization (HPO) The purpose of hyperparameter optimization is to find the global optimal value ?? * of the objective function f(x) can be evaluated for any arbitrary ?? ? ??, ?? * = arg ?????? ????? ð??"ð??"(??), and X is a hyperparameter space that can contain categorical, discrete, and continuous variables [27].In order to construct the design of different machine learning models, the application of effective hyperparameter optimization techniques can simplify the process of identifying the best hyperparameters for the models. HPO contains four major components: First, an estimator that could be a regressor or any classifier with one or more objective functions, second: a search space, Third: an optimization method to find the best combinations, and Fourth: a function to make a comparison between the effectiveness of various hyperparameter configurations [28]. Some of the common hyperparameter techniques is discussed below: Grid Search: Grid search is a process that exhaustively searches a manually specified subset of the hyperparameter space of the target algorithm [30]. A traditional approach to finding the optimum is to do a grid search, for example, to run experiments or processes on a number of conditions, for example, if there are three factors, a 15 × 15× 15 would mean performing 3375 experiments under different conditions. [32]. Grid search is more practical when [31]: (1) the total number of parameters in the model is small, say M <10. The grid is M-dimensional, so the number of test solutions is proportional to L M , where L is the number of test solutions along each dimension of the grid. (2) The solution is known to be within a specific range of values, which can be used to define the limits of the grid. (3) The direct problem d = g (m) can be computed quickly enough that the time required to compute L M from them is not prohibitive. (4) The error function E (m) is uniform on the scale of the grid spacing, Î?"m, so that the minimum is not lost because the grid spacing is too coarse. There are many problems with the grid search method. The first is that the number of experiments can be prohibitive if there are several factors. The second is that there can be significant experimental error, which means that if the experiments are repeated under identical conditions, different responses can be obtained; therefore, choosing the best point on the grid can be misleading, especially if the optimum is fairly flat. The third is that the initial grid may be too small for the number of experiments to be feasible, and it could lose characteristics close to the optimum or find a false (local) optimum [32]. Random Search: Random search [33] is a basic improvement on grid search. It indicates a randomized search over hyper-parameters from certain distributions over possible parameter values. The searching process continues till the predetermined budget is exhausted, or until the desired accuracy is reached. This methods are the simplest stochastic optimization and are very useful for certain problems, such as small search space and fast-running simulation. RS finds a value for each hyperparameter, prior to the probability distribution function. Both the GS and RS estimate the cost measure based on the produced hyperparameter sets. Although RS is simple, it has proven to be more effective than Grid search in many of the cases [33]. Random search has been shown to provide better results due to several benefits: first, the budget can be set independently according to the distribution of the search space, therefore, random search can work better especially when multiple hyper-parameters are not uniformly distributed [34]. Second: Because each evaluation is independent, it is easy to parallelize and allocate resources. Unlike GS, RS samples a number of parameter combinations from a defined distribution, which maximizes system efficiency by reducing the likelihood of wasting a lot of time in a small, underperforming area. In addition, this method can detect global optimum values or close to global if given a sufficient budget. Third, although getting optimal results using random search is not promising, more time consumption will lead to a greater likelihood of finding the best hyperparameter set, whereas longer search BO is more efficient than GS and RS because it can detect optimal combinations of hyperparameters by analyzing previously tested values, and running the surrogate model is usually much cheaper than running the objective function as a whole. However, because Bayesian optimization models are run based on previously tested values, it is difficult to belong to them with parallel sequential methods; but they are generally able to detect optimal close hyperparameter combinations in a few iterations [36]. Common substitution models for BO include the Gaussian process (GP) [37], random forest (RF) [38], and Parzen estimator (TPE) [39]. Therefore, there are three main BO algorithms based on their substitution models: BO-GP, BO-RF, BO-TPE. GP is an attractive reduced order model of BO that can be used to quantify forecast uncertainty. This is not a parametric model and the number of its parameters depends only on the input points. With the right kernel function, your GP can take advantage of the data structure. However, the GP also has disadvantages. For example, it is conceptually difficult to understand with BO theory. In addition, its low scalability with large dimensions or a large number of data points is another important issue [36]. Applying HPO in ML Models In order to put the theory into practice, several experiments have been performed on an industrialbased synthetic polymer model. This section describes experiments with four different HPO techniques on three general and representative ML algorithms. In the first part of the section, we discussed the experimental setup and the main HPO process. In the second part, we compare and analyze the results of the application of different HPO methods. The use of random search is recommended in the early stages of HPO to narrow the search space quickly, before using guided algorithms to get better results. The main drawback [28] of RS and GS is that each evaluation in its iteration does not depend on previous evaluations; thus, they waste time evaluating underperforming areas of the search space. Table 2: Performance evaluation of applying HPO methods to the regressor on the synthetic polymer dataset # Discussion & Conclusion Machine learning has become the primary strategy for dealing with data problems and is widely used in various applications. To apply ML models to practical problems, hyperparameters must be tuned to handle specific datasets. However, as the size of the generated data increases greatly in real life, and manual tuning of hyperparameters is extremely computationally expensive, it has become essential to optimize the hyperparameters by an automatic process. In this work, we used hyperparameter techniques in the ML model to find the best set of hyperparameters. Our data set was small, and in this small datset we can see that the randomly selected subsets are very representative for the given data set, as they can effectively optimize all types of hyperparameters. Our future work would be to test our model on a much larger data set and see the feedback. 1![Figure 1: (a) Manual tuning (b) Random tuning (c) Grid tuning approach [From left to Right]](image-2.png "Figure 1 :") ![Bayesian Optimization: Bayesian optimization (BO) is a commonly used reprocessing algorithm for HPO problems. Unlike GS and RS, BO determines future assessment levels based on the previous results. To determine the following parameters of the hyperparameter, BO uses two key factors: a surrogate model and an acquisition function. The division model aims to match all the points that are now seen in the objective function. The acquisition function determines the use of different points, balancing exploration and exploitation. The BO model balances the search and use process to identify the best possible area and avoid losing the best configuration in undeveloped areas [35]. The basic BO method works as follows: (i) Building a reduced-order probabilistic model (ROM) of the objective function. (ii) Finding the best hyperparameter values in the ROM model. (iii) Applying those optimal values to the objective function. (iv) Updating the ROM model with the new set of results. (v) Repeating above steps until achieving maximum number of iterations.](image-3.png "") 2![Figure 2: Exploration-based (left) and exploitation-based Bayesian optimization (right); the shadow indicates uncertainty (Yang and Shami, 2020)IV.](image-4.png "Figure 2 :") 1Machine Learning Model Optimization with Hyper Parameter Tuning ApproachYear 202110( ) DML ModelHyper-parameterRF Regressorn_estimators, max_depth, min_samples_split, min_samples_leaf, criterion, max_featuresSVM RegressorC, kernel, epsilonKNN Regressorn neighbors© 2021 Global JournalsGlobal Journal of Computer Science and TechnologyVolume XXI Issue II Version I ## Conflict of Interest: The authors whose names are listed in this work certify that they have no affiliations with or involvement in any organization or entity with any financial interest, or non-financial interest in the subject matter or materials discussed in this manuscript. * Basic Enhancement Strategies When Using Bayesian Optimization for Hyperparameter Tuning of Deep Neural Networks HCho YKim ELee DChoi YLee WRhee 10.1109/access.2020.2981072 IEEE Access 8 2020 * Algorithms for hyperparameter optimization JSBergstra RBardenet YBengio BKégl Advances in Neural Information Processing Systems 2011 * Beyond manual tuning of hyperparameters FHutter JLücke LSchmidt-Thieme DISKI 29 4 2015 * Evolutionary tuning of multiple SVM parameters FFriedrichs CIgel Neurocomputing 64 2005 * RGMantovani ALRossi JVanschoren BBischl ACDe Carvalho 2015 International Joint Conference on Neural Networks (IJCNN) 2015 Effectiveness of random search in SVM hyper-parameter tuning * Random search and reproducibility for neural architecture search LLi ATalwalkar arXiv:1902.07638 2019 arXiv preprint * Towards an empirical foundation for assessing bayesian optimization of hyperparameters KEggensperger MFeurer FHutter JBergstra JSnoek HHoos KLeyton-Brown NIPS workshop on Bayesian Optimization in Theory and Practice 2013 10 3 * An empirical evaluation of deep architectures on problems with many factors of variation HLarochelle DErhan ACourville JBergstra YBengio Proceedings of the 24th International Conference on Machine Learning the 24th International Conference on Machine Learning ACM 2007 * JSnoek ORippel KSwersky RKiros NSatish NSundaram Scalable bayesian optimization using deep neural networks, in: International conference on machine learning 2015 * Efficient global optimization of expensive black-box functions DRJones MSchonlau WJWelch J. Glob. Optim 13 1998 * Manifold Gaussian processes for regression RCalandra JPeters CERasmussen MPDeisenroth Proceedings of the 2016 International Joint Conference on Neural Networks the 2016 International Joint Conference on Neural NetworksVancouver, BC, Canada July 2016 * A review of random search methods SAndrad_Ottir Handbook of Simulation Optimization Springer 2015 * A Novel Bandit-Based Approach to Hyperparameter Optimization LLi KJamieson GDesalvo ARostamizadeh ATalwalker Hyperband Journal of Machine Learning Research 18 2018 * Learning curve prediction with Bayesian neural networks AKlein SFalkner JTSpringenberg FHutter International Conference On Learning Representation (ICLR 2017 * Random Search for Hyper-Parameter Optimization JSBergstra YBengio Journal of Machine Learning Research 13 2012 * Support Vector Regression Based on Grid-Search Method for Short-Term Wind Power Forecasting HZhang LChen YQu GZhao ZGuo 10.1155/2014/835791 Journal of Applied Mathematics 2014. 2014 * Efficient Hyperparameter Tuning with Grid Search for Text Categorization using kNN Approach with BM25 Similarity RGhawi JPfeffer 10.1515/comp-2019-0011 Open Computer Science 9 1 2019 * Investigation of the equality constraint effect on the reduction of the rotational ambiguity in threecomponent system using a novel grid search method SBeyramysoltan RRajkó HAbdollahi 10.1016/j.aca.2013.06.043 Analytica Chimica Acta 791 2013 * Deep Neural Network Hyperparameter Optimization with Orthogonal Array Tuning XZhang XChen LYao CGe MDong 10.1007/978-3-030-36808-1_31 Computer and Information Science Neural Information Processing 2019 * Crystalline behaviors and phase transition during the manufacture of fine denier PA6 fibers CZhang YLiu SLiu 10.1007/s11426-009-0242-5 Sci. China Ser. B-Chem 52 1835 2009 * What Is Denier Rating? Why Does It Matter To You? Joe Digi Travelist 2020. May 5 * Handbook of Properties of Textile and Technical Fibres || Tensile properties of cotton fibers YehiaElmogahzy 10.1016/B978-0-08-101272-7.00007-9 2018 * Materials and design for sports apparel KBlair 10.1533/9781845693664.1.60 Materials in Sports Equipment 2007 * Forming Processes. Manufacturing Process Selection Handbook KSwift JBooker 10.1016/b978-0-08-099360-7.00004-5 2013 * Basic Enhancement Strategies When Using Bayesian Optimization for Hyperparameter Tuning of Deep Neural Networks HCho YKim ELee DChoi YLee WRhee 10.1109/access.2020.2981072 IEEE Access 8 2020 * On hyperparameter optimization of machine learning algorithms: Theory LYang A&shami 2020 * Novel suboptimal approaches for hyperparameter tuning of deep neural network MAmirabadi MKahaei SNezamalhosseini 10.1016/j.phycom.2020.101057 2020 Physical Communication 41 101057 under the shelf of optical communication * Advances in Yarn Spinning Technology || Yarn structure and properties from different spinning techniques GKTyagi doi:10. 1533/9780857090218.1.119 2010 * 10.1016/j.neucom.2020.07.061 Neurocomputing 415 * Continuous Model Selection for Large-Scale Recommender Systems. Handbook of Statistics Big Data Analytics SChan P&treleaven 10.1016/b978-0-444-63492-4.00005-8 2015 * WMenke 10.1016/b978-0-12-397160-9.00009-6 Nonlinear Inverse Problems. Geophysical Data Analysis: Discrete Inverse Theory 2012 * Steepest Ascent, Steepest Descent, and Gradient Methods RBrereton 10.1016/b978-044452701-1.00037-5 Comprehensive Chemometrics 2009 * Random search for hyper-parameter optimization JBergstra YBengio J. Mach. Learn. Res 1532-4435 13 2012 * Hyper-Parameter Optimization: A Review of Algorithms and Applications TYu HZhu 2020 * EHazan AKlivans YYuan arXiv:1706.00764 Hyperparameter optimization: a spectral approach 2017 arXiv preprint * Automated Machine Learning Methods, Systems FHutter LKotthoff J&vanschoren 2019 Springer International Publishing * Gaussian Processes For Machine Learning MSeeger 10.1142/s0129065704001899 International Journal of Neural Systems 14 02 2004 * Sequential Model-Based Optimization for General Algorithm Configuration FHutter HHHoos KLeyton-Brown Lecture Notes in Computer Science Coello C.A.C. 6683 2011. 2011 (eds) Learning and Intelligent Optimization * Springer 10.1007/978-3-642-25566-3_40 Berlin, Heidelberg * Algorithms for hyper-parameter optimization JBergstra RBardenet YBengio BKégl Adv Neural Inf Process Syst (NIPS) 24 2011