# Introduction
n the era of Machine learning, performance (based on accuracy and computing time) is very important. The growing number of tuning parameters associated with the Machine learning models is tedious and timeconsuming to set by standard optimization techniques. Researchers working with ML models often spend long hours to find the perfect combination of hyperparameters [1]. If we think w, x, y, z as the parameters of the model, and if all of these parameters are integers ranging from 0.0001 to say 5.00, then hyperparameter tuning is the finding the best combinations to make the objective function optimal.
One of the major difficulties in working with the Machine learning problem is tuning hyperparameters. These are the design parameters that could directly affect the training outcome. The conversion from a nontuned Machine learning model to a tuned ML model is like learning to predict everything accurately from predicting nothing correctly [2]. There are two types of parameters in ML models: Hyperparameters, and Model parameters. Hyperparameters are arbitrarily set by the user even before starting to train the model, whereas, the model parameters are learned during the training.
The quality of a predictive model mostly depends on the configuration of its hyperparameters, but it is often difficult to know how these hyperparameters interact with each other to affect the final results of the model [14]. To determine accuracy and make a comparison between two models it is always better to make comparisons between two models with both of the models' parameters tuned. It would be unfair to compare a Decision Tree model with the best parameter against an ANN model whose hyperparameters haven't been optimized yet.
# II.
# Literature Review
The hyperparameter tuning, due to its importance, has changed to a new interesting topic in the ML community. The hyperparameter tuning algorithms are either model-free or model-based.
Model-free algorithms are free of using knowledge about the solution space extracted during the optimization; a few of this category includes manual search [4], random search [2,[6][7], and grid search [5]. In the Manual search categories, we assume the values of the parameters based on our previous experience. In this technique, the user allows some sets of hyperparameters based on judgments or previous experience, trains the algorithm by them, observes the performance, keeps repeating to train the model until achieving a satisfactory accuracy and then selects the best set of hyperparameters that gives the maximum accuracy. However, this technique is heavily dependent on the judgment and previous expertise and its reliability is dependent on the correctness of the previous knowledge [3]. Some of the few of the main parameters used by Random forest classifiers are: criterion, max_depth, n_estimators, min_samples_split etc.
In the Random search, we train and test our model based on some random combinations of the hyperparameters. This method is better used to identify new combinations of the parameters or to discover new hyperparameters. Although it may take more time to process, it often leads to better performance. Bergstra et al. (2012) in their work mentioned that, over the same domain, random search is able to find models that are as good as or even better in a reduced computation time. After granting the same computational budget for the random search, it was evident that random search can find better models by effectively searching for larger and less promising configuration spaces [16]. Random Search, which is developed based on grid research, sets up a grid of hyper-parameter values and selects random combinations to train the algorithm; [2].
In the grid search, the user sets a grid of hyperparameters and trains the model based on each possible combination. Amirabadi et al. (2020) proposes two novel suboptimal grid search techniques on the four separate dataset to show the efficiency of their hyperparameter tuning model and later compare it with some of the other recently published work. The main drawback of the grid search method is its high complexity. It is commonly used when there are a few numbers of hyperparameters to be tuned. In other words, grid search works well when the best combinations are already determined. Some of the similar works of grid search applications have been reported by Zhang et al. (2014) [17], Ghawi et al. (2019) [18], and Beyramysoltan et al. (2013) [19].
Zhang et al. (2019) [20] in their work reported a few of the drawbacks of the existing hyperparameter tuning methods. In their work, they mentioned grid search as an ad-hoc process, as it traverses all the possible combinations, and the entire procedure requires a lot of time. Andradóttir (2014) [13] shows that Random Search (RS) eradicates some of the limitations of the grid search technique to an extent. RS can reduce the overall time consumption, but the main disadvantage is that it cannot converge to the global optimal value.
The combination of randomly selected hyperparameters can never guarantee a steady and widely acceptable result. That's why, apart from the manually tuning methods, automated tuning methods are becoming more and more popular in recent times; snoek et al. (2015) [10]. Bayesian Optimization is one of the most widely used automated hyperparameter tuning methods to find the global optimum in fewer steps. However, Bayesian optimization's results are sensitive to the parameters of the surrogate model and the accuracy is greatly depending on the quality of the learning model; Amirabadi et al. ( 2020) [3].
To minimize the error function of hyperparameter values, Bayesian optimization adopts probabilistic surrogate models like Gaussian processes. Through precise exploration and development, an alternative model of hyperparameter space is established; Eggensperger et al. (2013) [8]. However, probabilistic surrogates need accurate estimations of sufficient statistics of error function distribution. So, a sizable number of hyperparameters is required to evaluate the estimations and this method doesn't work well when there is to process myriad hyperparameters altogether.
# III.
# Methodology a) Dataset description
Denier: Denier is a weight measurement usually refers to the thickness of the threads. It is the weight (grams) of a single optical fiber for 9 kilometers. If we have a 9 km fiber weighs 1 gram, this fiber has a denier of 1, or 1D. A fiber with less than 1 gram weight calls Microfibers [22]. Microfibers become a new development trend in the synthetic polymer industry. The higher the denier is, the more thick and strong the fiber is. Conversely, less denier means that the fiber/fabric will be softer and more transparent. Fine denier fibers are becoming a new standard and are very useful for the development of new textiles with excellent performance [21].
Breaking Elongation (%): Elongation at break is one of the few main quality parameters of any synthetic fiber [24]. It is the percentage of elongation at break. Fiber elongation partly reflects the extent of stretching a filament under a certain loading condition. Fibers with high elongation at break are determined to be easily stretched under a predetermined load. Fibers showing these characteristics are known to be flexible. The elongation behavior of any single fiber can be complex because of its multiplicity of structural factors affecting it. Moreover, a cotton fiber comes up with a natural crimp, which is important for fibers to stick together while undergoing other production processes [23]. If L is the length of the fiber, then the equation for the percentages of the breaking elongation would be:
?? ???????????????? ???????????????????? = ??? ?????????? ?? 0 * 100%
Breaking elongation for the cotton fiber might be varied from 5% to 10%, which is significantly lower than that of wool fibers (25%-45%), and much lower than polyester fibers (typically over 50%).
Breaking force (cN) and Tenacity (cN/tex): Breaking tenacity is the maximum load that a single fiber can withstand before breaking. For the Polypropylene and PET staple fibers, 10 mm lengths sample filaments is drawn until failure. Breaking tenacity is measured in grams/denier. Very small forces are encountered when evaluating fiber properties, so an instrument with gramlevel accuracy is required [25]. The tenacity of virgin PP fibers is about 5-8 g/den, and the elongation at break is about 100%. At the same time, the tenacity of recycled PET is about 3.5-5.7 g/den; the elongation at break usually exceeds 100%. Draw Ratio: The drawing ration is the ratio of the diameter of the initial blank form to the diameter of the drawn part. The limiting drawing ratio (Capstan speed/Nip reel speed) for the extruder section is between 1.6 and 2.2 [26], whereas, for the stretching section it is in between 3 and 4.
# b) Hyper-parameter Optimization (HPO)
The purpose of hyperparameter optimization is to find the global optimal value ?? * of the objective function f(x) can be evaluated for any arbitrary ?? ? ??, ?? * = arg ?????? ????? ð??"ð??"(??), and X is a hyperparameter space that can contain categorical, discrete, and continuous variables [27].In order to construct the design of different machine learning models, the application of effective hyperparameter optimization techniques can simplify the process of identifying the best hyperparameters for the models. HPO contains four major components: First, an estimator that could be a regressor or any classifier with one or more objective functions, second: a search space, Third: an optimization method to find the best combinations, and Fourth: a function to make a comparison between the effectiveness of various hyperparameter configurations [28]. Some of the common hyperparameter techniques is discussed below: Grid Search: Grid search is a process that exhaustively searches a manually specified subset of the hyperparameter space of the target algorithm [30]. A traditional approach to finding the optimum is to do a grid search, for example, to run experiments or processes on a number of conditions, for example, if there are three factors, a 15 × 15× 15 would mean performing 3375 experiments under different conditions. [32]. Grid search is more practical when [31]: (1) the total number of parameters in the model is small, say M <10. The grid is M-dimensional, so the number of test solutions is proportional to L M , where L is the number of test solutions along each dimension of the grid. (2) The solution is known to be within a specific range of values, which can be used to define the limits of the grid. (3) The direct problem d = g (m) can be computed quickly enough that the time required to compute L M from them is not prohibitive. (4) The error function E (m) is uniform on the scale of the grid spacing, Î?"m, so that the minimum is not lost because the grid spacing is too coarse.
There are many problems with the grid search method. The first is that the number of experiments can be prohibitive if there are several factors. The second is that there can be significant experimental error, which means that if the experiments are repeated under identical conditions, different responses can be obtained; therefore, choosing the best point on the grid can be misleading, especially if the optimum is fairly flat. The third is that the initial grid may be too small for the number of experiments to be feasible, and it could lose characteristics close to the optimum or find a false (local) optimum [32].
Random Search: Random search [33] is a basic improvement on grid search. It indicates a randomized search over hyper-parameters from certain distributions over possible parameter values. The searching process continues till the predetermined budget is exhausted, or until the desired accuracy is reached. This methods are the simplest stochastic optimization and are very useful for certain problems, such as small search space and fast-running simulation. RS finds a value for each hyperparameter, prior to the probability distribution function. Both the GS and RS estimate the cost measure based on the produced hyperparameter sets. Although RS is simple, it has proven to be more effective than Grid search in many of the cases [33].
Random search has been shown to provide better results due to several benefits: first, the budget can be set independently according to the distribution of the search space, therefore, random search can work better especially when multiple hyper-parameters are not uniformly distributed [34]. Second: Because each evaluation is independent, it is easy to parallelize and allocate resources. Unlike GS, RS samples a number of parameter combinations from a defined distribution, which maximizes system efficiency by reducing the likelihood of wasting a lot of time in a small, underperforming area. In addition, this method can detect global optimum values or close to global if given a sufficient budget. Third, although getting optimal results using random search is not promising, more time consumption will lead to a greater likelihood of finding the best hyperparameter set, whereas longer search BO is more efficient than GS and RS because it can detect optimal combinations of hyperparameters by analyzing previously tested values, and running the surrogate model is usually much cheaper than running the objective function as a whole. However, because Bayesian optimization models are run based on previously tested values, it is difficult to belong to them with parallel sequential methods; but they are generally able to detect optimal close hyperparameter combinations in a few iterations [36]. Common substitution models for BO include the Gaussian process (GP) [37], random forest (RF) [38], and Parzen estimator (TPE) [39]. Therefore, there are three main BO algorithms based on their substitution models: BO-GP, BO-RF, BO-TPE. GP is an attractive reduced order model of BO that can be used to quantify forecast uncertainty. This is not a parametric model and the number of its parameters depends only on the input points. With the right kernel function, your GP can take advantage of the data structure. However, the GP also has disadvantages. For example, it is conceptually difficult to understand with BO theory. In addition, its low scalability with large dimensions or a large number of data points is another important issue [36]. Applying HPO in ML Models
In order to put the theory into practice, several experiments have been performed on an industrialbased synthetic polymer model. This section describes experiments with four different HPO techniques on three general and representative ML algorithms. In the first part of the section, we discussed the experimental setup and the main HPO process. In the second part, we compare and analyze the results of the application of different HPO methods. The use of random search is recommended in the early stages of HPO to narrow the search space quickly, before using guided algorithms to get better results. The main drawback [28] of RS and GS is that each evaluation in its iteration does not depend on previous evaluations; thus, they waste time evaluating underperforming areas of the search space.
Table 2: Performance evaluation of applying HPO methods to the regressor on the synthetic polymer dataset
# Discussion & Conclusion
Machine learning has become the primary strategy for dealing with data problems and is widely used in various applications. To apply ML models to practical problems, hyperparameters must be tuned to handle specific datasets. However, as the size of the generated data increases greatly in real life, and manual tuning of hyperparameters is extremely computationally expensive, it has become essential to optimize the hyperparameters by an automatic process. In this work, we used hyperparameter techniques in the ML model to find the best set of hyperparameters. Our data set was small, and in this small datset we can see that the randomly selected subsets are very representative for the given data set, as they can effectively optimize all types of hyperparameters. Our future work would be to test our model on a much larger data set and see the feedback.
1![Figure 1: (a) Manual tuning (b) Random tuning (c) Grid tuning approach [From left to Right]](image-2.png "Figure 1 :")
![Bayesian Optimization: Bayesian optimization (BO) is a commonly used reprocessing algorithm for HPO problems. Unlike GS and RS, BO determines future assessment levels based on the previous results. To determine the following parameters of the hyperparameter, BO uses two key factors: a surrogate model and an acquisition function. The division model aims to match all the points that are now seen in the objective function. The acquisition function determines the use of different points, balancing exploration and exploitation. The BO model balances the search and use process to identify the best possible area and avoid losing the best configuration in undeveloped areas [35]. The basic BO method works as follows: (i) Building a reduced-order probabilistic model (ROM) of the objective function. (ii) Finding the best hyperparameter values in the ROM model. (iii) Applying those optimal values to the objective function. (iv) Updating the ROM model with the new set of results. (v) Repeating above steps until achieving maximum number of iterations.](image-3.png "")
2![Figure 2: Exploration-based (left) and exploitation-based Bayesian optimization (right); the shadow indicates uncertainty (Yang and Shami, 2020)IV.](image-4.png "Figure 2 :")
1Machine Learning Model Optimization with Hyper Parameter Tuning ApproachYear 202110( ) DML ModelHyper-parameterRF Regressorn_estimators, max_depth, min_samples_split, min_samples_leaf, criterion, max_featuresSVM RegressorC, kernel, epsilonKNN Regressorn neighbors© 2021 Global JournalsGlobal Journal of Computer Science and TechnologyVolume XXI Issue II Version I
## Conflict of Interest:
The authors whose names are listed in this work certify that they have no affiliations with or involvement in any organization or entity with any financial interest, or non-financial interest in the subject matter or materials discussed in this manuscript.
*
Basic Enhancement Strategies When Using Bayesian Optimization for Hyperparameter Tuning of Deep Neural Networks
HCho
YKim
ELee
DChoi
YLee
WRhee
10.1109/access.2020.2981072
IEEE Access
8
2020
*
Algorithms for hyperparameter optimization
JSBergstra
RBardenet
YBengio
BKégl
Advances in Neural Information Processing Systems
2011
*
Beyond manual tuning of hyperparameters
FHutter
JLücke
LSchmidt-Thieme
DISKI
29
4
2015
*
Evolutionary tuning of multiple SVM parameters
FFriedrichs
CIgel
Neurocomputing
64
2005
*
RGMantovani
ALRossi
JVanschoren
BBischl
ACDe Carvalho
2015 International Joint Conference on Neural Networks (IJCNN)
2015
Effectiveness of random search in SVM hyper-parameter tuning
*
Random search and reproducibility for neural architecture search
LLi
ATalwalkar
arXiv:1902.07638
2019
arXiv preprint
*
Towards an empirical foundation for assessing bayesian optimization of hyperparameters
KEggensperger
MFeurer
FHutter
JBergstra
JSnoek
HHoos
KLeyton-Brown
NIPS workshop on Bayesian Optimization in Theory and Practice
2013
10
3
*
An empirical evaluation of deep architectures on problems with many factors of variation
HLarochelle
DErhan
ACourville
JBergstra
YBengio
Proceedings of the 24th International Conference on Machine Learning
the 24th International Conference on Machine Learning
ACM
2007
*
JSnoek
ORippel
KSwersky
RKiros
NSatish
NSundaram
Scalable bayesian optimization using deep neural networks, in: International conference on machine learning
2015
*
Efficient global optimization of expensive black-box functions
DRJones
MSchonlau
WJWelch
J. Glob. Optim
13
1998
*
Manifold Gaussian processes for regression
RCalandra
JPeters
CERasmussen
MPDeisenroth
Proceedings of the 2016 International Joint Conference on Neural Networks
the 2016 International Joint Conference on Neural NetworksVancouver, BC, Canada
July 2016
*
A review of random search methods
SAndrad_Ottir
Handbook of Simulation Optimization
Springer
2015
*
A Novel Bandit-Based Approach to Hyperparameter Optimization
LLi
KJamieson
GDesalvo
ARostamizadeh
ATalwalker
Hyperband
Journal of Machine Learning Research
18
2018
*
Learning curve prediction with Bayesian neural networks
AKlein
SFalkner
JTSpringenberg
FHutter
International Conference On Learning Representation (ICLR
2017
*
Random Search for Hyper-Parameter Optimization
JSBergstra
YBengio
Journal of Machine Learning Research
13
2012
*
Support Vector Regression Based on Grid-Search Method for Short-Term Wind Power Forecasting
HZhang
LChen
YQu
GZhao
ZGuo
10.1155/2014/835791
Journal of Applied Mathematics
2014. 2014
*
Efficient Hyperparameter Tuning with Grid Search for Text Categorization using kNN Approach with BM25 Similarity
RGhawi
JPfeffer
10.1515/comp-2019-0011
Open Computer Science
9
1
2019
*
Investigation of the equality constraint effect on the reduction of the rotational ambiguity in threecomponent system using a novel grid search method
SBeyramysoltan
RRajkó
HAbdollahi
10.1016/j.aca.2013.06.043
Analytica Chimica Acta
791
2013
*
Deep Neural Network Hyperparameter Optimization with Orthogonal Array Tuning
XZhang
XChen
LYao
CGe
MDong
10.1007/978-3-030-36808-1_31
Computer and Information Science Neural Information Processing
2019
*
Crystalline behaviors and phase transition during the manufacture of fine denier PA6 fibers
CZhang
YLiu
SLiu
10.1007/s11426-009-0242-5
Sci. China Ser. B-Chem
52
1835
2009
*
What Is Denier Rating? Why Does It Matter To You?
Joe
Digi Travelist
2020. May 5
*
Handbook of Properties of Textile and Technical Fibres || Tensile properties of cotton fibers
YehiaElmogahzy
10.1016/B978-0-08-101272-7.00007-9
2018
*
Materials and design for sports apparel
KBlair
10.1533/9781845693664.1.60
Materials in Sports Equipment
2007
*
Forming Processes. Manufacturing Process Selection Handbook
KSwift
JBooker
10.1016/b978-0-08-099360-7.00004-5
2013
*
Basic Enhancement Strategies When Using Bayesian Optimization for Hyperparameter Tuning of Deep Neural Networks
HCho
YKim
ELee
DChoi
YLee
WRhee
10.1109/access.2020.2981072
IEEE Access
8
2020
*
On hyperparameter optimization of machine learning algorithms: Theory
LYang
A&shami
2020
*
Novel suboptimal approaches for hyperparameter tuning of deep neural network
MAmirabadi
MKahaei
SNezamalhosseini
10.1016/j.phycom.2020.101057
2020
Physical Communication
41
101057
under the shelf of optical communication
*
Advances in Yarn Spinning Technology || Yarn structure and properties from different spinning techniques
GKTyagi
doi:10. 1533/9780857090218.1.119
2010
*
10.1016/j.neucom.2020.07.061
Neurocomputing
415
*
Continuous Model Selection for Large-Scale Recommender Systems. Handbook of Statistics Big Data Analytics
SChan
P&treleaven
10.1016/b978-0-444-63492-4.00005-8
2015
*
WMenke
10.1016/b978-0-12-397160-9.00009-6
Nonlinear Inverse Problems. Geophysical Data Analysis: Discrete Inverse Theory
2012
*
Steepest Ascent, Steepest Descent, and Gradient Methods
RBrereton
10.1016/b978-044452701-1.00037-5
Comprehensive Chemometrics
2009
*
Random search for hyper-parameter optimization
JBergstra
YBengio
J. Mach. Learn. Res
1532-4435
13
2012
*
Hyper-Parameter Optimization: A Review of Algorithms and Applications
TYu
HZhu
2020
*
EHazan
AKlivans
YYuan
arXiv:1706.00764
Hyperparameter optimization: a spectral approach
2017
arXiv preprint
*
Automated Machine Learning Methods, Systems
FHutter
LKotthoff
J&vanschoren
2019
Springer International Publishing
*
Gaussian Processes For Machine Learning
MSeeger
10.1142/s0129065704001899
International Journal of Neural Systems
14
02
2004
*
Sequential Model-Based Optimization for General Algorithm Configuration
FHutter
HHHoos
KLeyton-Brown
Lecture Notes in Computer Science
Coello C.A.C.
6683
2011. 2011
(eds) Learning and Intelligent Optimization
*
Springer
10.1007/978-3-642-25566-3_40
Berlin, Heidelberg
*
Algorithms for hyper-parameter optimization
JBergstra
RBardenet
YBengio
BKégl
Adv Neural Inf Process Syst (NIPS)
24
2011