Next Article in Journal
Adsorption of Pb, Cu and Cd from Water on Coal Fly Ash-Red Mud Modified Composite Material: Characterization and Mechanism
Next Article in Special Issue
Runoff Prediction of Irrigated Paddy Areas in Southern China Based on EEMD-LSTM Model
Previous Article in Journal
Environmental Stable and Radioactive Isotopes in the Assessment of Thermomineral Waters in Lisbon Region (Portugal): Contributions for a Conceptual Model
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Performance of Machine Learning Techniques for Meteorological Drought Forecasting in the Wadi Mina Basin, Algeria

by
Mohammed Achite
1,2,
Nehal Elshaboury
3,
Muhammad Jehanzaib
4,
Dinesh Kumar Vishwakarma
5,
Quoc Bao Pham
6,*,
Duong Tran Anh
7,8,
Eslam Mohammed Abdelkader
9,10 and
Ahmed Elbeltagi
11
1
Laboratory of Water and Environment, Faculty of Nature and Life Sciences, Hassiba Benbouali University of Chlef, Chlef 02180, Algeria
2
Georessources, Environment and Natural Risks Laboratory, University of Oran, Oran 31000, Algeria
3
Housing and Building National Research Centre, Construction and Project Management Research Institute, Giza 12311, Egypt
4
Research Institute of Engineering and Technology, Hanyang University, Ansan 15588, Republic of Korea
5
Department of Irrigation and Drainage Engineering, G.B. Pant, University of Agriculture and Technology, Pantnagar 263145, India
6
Institute of Applied Technology, Thu Dau Mot University, Thu Dau Mot City 75000, Vietnam
7
Laboratory of Environmental Sciences and Climate Change, Institute for Computational Science and Artificial Intelligence, Van Lang University, Ho Chi Minh City 700000, Vietnam
8
Faculty of Environment, School of Technology, Van Lang University, Ho Chi Minh City 700000, Vietnam
9
Department of Building and Real Estate, Faculty of Construction and Environment, The Hong Kong Polytechnic University, ZN716 Block Z Phase 8 Hung Hom, Kowloon 999077, Hong Kong
10
Structural Engineering Department, Faculty of Engineering, Cairo University, Giza 12613, Egypt
11
Agricultural Engineering Department, Faculty of Agriculture, Mansoura University, Mansoura 35516, Egypt
*
Author to whom correspondence should be addressed.
Water 2023, 15(4), 765; https://doi.org/10.3390/w15040765
Submission received: 9 December 2022 / Revised: 1 February 2023 / Accepted: 7 February 2023 / Published: 15 February 2023

Abstract

:
Water resources, land and soil degradation, desertification, agricultural productivity, and food security are all adversely influenced by drought. The prediction of meteorological droughts using the standardized precipitation index (SPI) is crucial for water resource management. The modeling results for SPI at 3, 6, 9, and 12 months are based on five types of machine learning: support vector machine (SVM), additive regression, bagging, random subspace, and random forest. After training, testing, and cross-validation at five folds on sub-basin 1, the results concluded that SVM is the most effective model for predicting SPI for different months (3, 6, 9, and 12). Then, SVM, as the best model, was applied on sub-basin 2 for predicting SPI at different timescales and it achieved satisfactory outcomes. Its performance was validated on sub-basin 2 and satisfactory results were achieved. The suggested model performed better than the other models for estimating drought at sub-basins during the testing phase. The suggested model could be used to predict meteorological drought on several timescales, choose remedial measures for research basin, and assist in the management of sustainable water resources.

1. Introduction

Lack of precipitation during a drought is a complex and cyclical phenomenon that has a negative impact on agricultural and water resources, as well as on society [1,2]. The damage caused by drought is relatively higher than other natural disasters, such as extreme drought in South China, which reduced the area of Honghu Lake and severely impacted tourism, aquaculture, and the public [3]. Similar to this, the 2012 US drought cost over 12 billion USD in economic losses and subsequently raised food prices throughout the world [4]. Future droughts are predicted to be more frequent and more intense due to climate change, urging water resource managers to implement comprehensive risk-mitigation measures [5]. The creeping nature of drought can be beneficial for drought scientists to predict these events in advance [6]. In recent decades, advancements in various drought modeling approaches have been observed, which will ultimately play a critical role in effective drought modeling and drought risk reduction.
Droughts can be characterized as agricultural, meteorological, hydrological, or socioeconomic based on the kind of water insufficiency, such as precipitation, runoff, soil moisture, and water availability, respectively [7]. Among these categories, meteorological drought is the most important; it is barely dependent on precipitation and prolonged meteorological drought results in other drought categories [8]. Several drought indices, including the standardized precipitation index (SPI), Palmer drought severity index, standardized precipitation evapotranspiration index (SPEI), standardized runoff index, etc., have been developed in the past few decades to model meteorological, hydrological, and agricultural drought [5,7]. Standardized drought indices have been utilized often for drought modeling because they are easy to use, flexible, and can estimate drought throughout many periods, with few data requirements [3,9].
It is crucial to anticipate a drought before it occurs, in addition to monitoring it [10]. Despite the fact that predicting droughts is a challenging task owing to the inherent uncertainties and high degree of complexity [11], drought forecasting analysis is essential for supplying pertinent data for drought risk reduction [12]. In hydro-meteorological applications, physical and data-driven models are the most common drought forecasting models [13]. Data-driven models create the strongest link between independent and dependent variables, whereas physical-based models are focused on understanding the real dynamics of a system [14]. The parameter estimation of physical process-based models requires information regarding soil, land use, geography, topography, water abstraction, etc., which is not only difficult to obtain, but also poses difficulties in terms of deviating from a thorough scientific understanding of different physical processes [15]. Because of the drawbacks of physical process-based models, data-driven models are used increasingly frequently in the field of hydrology and water management. Data-driven models such as machine learning models, regression models, and time-series models are commonly used in drought forecasting [5,16].
Data-driven models provide the capacity to anticipate droughts, according to Achite et al. [16]. Maca and Pech [17] utilized two types of artificial neural network (ANN) models to foresee droughts in two watersheds, namely Santa Ysabel Creek and Leaf River in South California. The results demonstrated that the hybrid ANN model outperformed the feed-forward ANN. Mokhtarzad et al. [18] employed three machine learning approaches, ANN, adaptive neuro-fuzzy inference system (ANFIS), and support vector machine (SVM), to predict meteorological drought at seasonal timescales in Iran. Although the models’ ability to predict drought was demonstrated by the results, SVM outperformed ANN and ANFIS. Similarly, Sattar et al. [19] used a Markov Bayesian classifier (MBC) to predict various classes of meteorological and hydrological drought. They reported that MBC had a range of 36% to 76% and 33% to 70% accuracy in forecasting both meteorological and hydrological drought, respectively. Jehanzaib et al. [12] compared the performance of six ML models for hydrological drought forecasting and concluded that the performance of the decision tree model was found to be superior in terms of forecast accuracy and computation time. Adnan et al. [20] integrated random vector functional link (RVFL) with the salp swarm algorithm, particle swarm optimization, hunger games search (HGS) algorithm, social spider optimization, genetic algorithm, and grey wolf optimization to forecast SPI at various timescales (3, 6, 9, and 12 months) and suggested that HGS-based RVFL can be used for drought forecasting with a high accuracy.
Most of the previous studies [12,13,17,18,19] utilized data-driven models for drought forecasting at a single timescale. It is critical to assess the performance of various data-driven models for drought forecasting at multiple timescales to make sound recommendations. Therefore, this study employed five state-of-the-art machine learning models, namely SVM, additive regression, bagging, random subspace, and random forest, for drought forecasting at 3-, 6-, 9-, and 12-month timescales. This study used SPI for meteorological drought estimation at various timescales due to the overwhelming benefits of the standardized drought indices. The main goals of this work were to build models for forecasting meteorological droughts using various data-driven approaches and to assess their effectiveness at various timeframes using accuracy metrics.

2. Materials and Methods

2.1. Description of the Study Area

The Wadi Mina basin in northwest Algeria served as the study region for this research. It has a total area of 4900 km2 and is located between 00°22′59″ and 01°09′02″ east, as well as between 34°41′57″ and 35°35′27″ north (Figure 1). It has four significant tributaries: Wadi Haddad, Wadi Abd, Wadi Mina, and Wadi Taht. The elevation varies between 164 and 1327 m. The topography of the basin is complex and uneven. The study region features a continental climate with dramatic seasonal temperature variations, such as bitterly cold winters and sweltering summers. The yearly precipitation averages between 200 and 500 mm, with most of it falling between November and March. The average yearly temperature is between 16 and 19.5 °C. Over half of the basin is covered in a variety of plant types, including 32% scrubs, 35.8% woods, and cereal crops [21]. Monthly rainfall and runoff records are available for five rainfall and hydrometric stations over 40 years (1974–2009) (Figure 1 and Table 1).

2.2. Standardized Precipitation Index (SPI)

SPI is calculated using the cumulative probability of the monthly precipitation measured at the observation point [22]. At a meteorological station, the parameters of a precipitation probability density function, which follow a gamma distribution, are computed over the whole observation period using Equation (1).
g x = 1 β α Γ α x α 1 e x / β
where α and β are the shape and scale parameters, respectively. Meanwhile, x is he successive precipitation and Γ α is the gamma function, which is defined by Equation (2).
Γ a = 0 y a 1 e y d y
The alpha and beta parameters are defined as per Equation (3).
α = 1 4 A 1 + 1 + 4 A 3 ,     A = l n x ¯ ln x i n ,     β = x ¯ α
where n , x ¯ , and x i are the number of observations, mean precipitation, and total precipitation, respectively. The cumulative probability is estimated using Equation (4).
G x = 0 x g x d x = 1 β a Γ a 0 x x a 1 e x / β d x
Equation (5) shows the cumulative probability using a mixed probability distribution to represent the likelihood of no precipitation.
H x = q + 1 q G x
where q is the likelihood of no precipitation. The SPI is calculated using Equation (6).
SPI = t c 0 + c 1 t + c 2 t 2 1 + d 1 t + d 2 t 2 + d 3 t 3 , 0 < H x 0.5 + t c 0 + c 1 t + c 2 t 2 1 + d 1 t + d 2 t 2 + d 3 t 3 , 0.5 < H x 1.0
where t is determined as shown in Equation (7).
t = l n 1 H x 2 .0 < H x 0.5 l n 1 1 H x 2 . 0.5 < H x 1.0
where c 0 , c 1 , c 2 , d 1 , d 2 , and d 3 are coefficients with values of:
c 0 = 2.515517     c 1 = 0.802853     c 2 = 0.010328
d 1 = 1.432788     d 2 = 0.189269     d 3 = 0.001308
Based on SPI, several classifications and the projected probability of wet and dry spells can be investigated, as shown in Table 2.

2.3. Machine Learning Models

In this study, support vector machine (SVM), additive regression (AR), bagging, random subspace (RSS), and random forest (RF) models for the estimation of SPI at 3, 6, 9, and 12 months were developed. These different models were used for anticipating drought forecasting. The flowchart of the proposed methodology for drought forecasting at the Wadi Mina basin, Algeria, is illustrated in Figure 2.

2.3.1. Support Vector Machine (SVM)

SVM establishes the decision boundary or optimal line that may categorize an n-dimensional space. The SVM algorithm searches for the extreme points that aid in the hyperplane’s creation [23]. It is a well-known approach for supervised machine learning that is utilized for both classification and regression applications [24]. Equation (8) for linear SVM can be written as follows:
x 1 , y 1     x n , y n
where y 1 is either 1 or −1, depending on the linkage between the class and point x 1 . The hyperplane that divides the group of points, x 1 , when y 1 = 1 and y 1 = 1 . This plane is determined to optimize the distance between the hyperplane and the nearest point x 1 from either group. The hyperplane that satisfies Equation (9) for a set of points can be written as below:
w T x b = 0
where w is the hyperplane’s normal vector. The parameter, b | | w | | indicates how far away from the origin the hyperplane is from the normal vector. The schematic diagram and parameters of the support vector machine algorithm utilized for modeling drought forecast are shown in Figure 3.
The linear regression model is an important tool to utilize the independent variable, X, to forecast the dependent variable, Y. It is a statistical and machine learning algorithm that allows for mapping the numeric inputs to numeric outputs, fitting a best fit straight line into the datasets. The accuracy of the model is measured by least squares estimation. Equation (10) for linear regression can be written as follows:
Y i = β 0 + β 1 X 1 + β 2 X 2 + + β p X i n + ϵ
where Xi and Yi are the independent and dependent variables, respectively; β 0 is constant; β n is the slope coefficient of each Xi; and ϵ is the model error term or residuals.

2.3.2. Additive Regression (AR)

Using the stochastic gradient boosting approach described in Friedman [25], we construct an AR model. Each iteration of AR involves drawing a random sample from the training data (without replacing it) and fitting a standalone model to the residuals from the previous iteration. In the initial fit, the training data are fitted to a standalone model without resampling, resulting in the first set of residuals, which are used to fit a stochastic gradient boosting model in the subsequent fit. The stochastic gradient boosting procedure continues until the final iteration. In Friedman [25], the predictions of all standalone models in the ensemble can be aggregated once the AR model has been trained.
Recent research has demonstrated the widespread application of Bayesian additive regression trees (BARTs) [26,27]. The BART model can define complex relations between x and y by estimating f(x) in the form y = f(x) + ε, where ε∼N (0, σ2). Furthermore, a sum of m regression trees is used, i.e., f(x) = ∑ g(x; Tj, Mj), ranging between j = 1 and j = m, which allows for the estimation of f(x). BART is expressed in Equation (11):
Y = f x + ε = J = 1 m g x ;   T i ,   M j + ε  

2.3.3. Bagging

A bagging method known as bootstrap aggregation (BA) was developed by Breiman [28]. In bagging, m training sets are generated, and the datasets are then fitted to m models using an easy-to-use ensemble method. As a result of averaging their outputs or voting on them, the predictions are combined. As the training dataset was modified, several classifiers were created: Hm, m = 1, …, M, which were then merged into one class. As a result, the weight of this class was derived from the combined weight of the individual predictor classes, as per Equation (12):
H d i = s i g n   m = 1 M m + H d i  
A voting method can be used to describe the method. As αm was determined as m = 1, …, M, then more accurate classifications would have a greater effect than less accurate classifications. As the weak Hm classification was slightly more accurate than the random classification [29], the latter was referred to as the weak Hm classification. The input datasets were also modeled using regression trees. The uniqueness of each tree was based on its ability to forecast changes in the training dataset. As a final step, the weighted average of each regression tree’s projections was calculated.

2.3.4. Random Subspace (RSS)

RSS is an ensemble machine learning algorithm that combines the prediction variables from different decision trees trained on multiple subsets of columns in the training dataset. It is a problem-independent metaheuristic technique that can be applied to a broad range of problems. Particularly when there are few training datasets compared with the amount of data, RSS is preferred [24]. It introduces randomness by selecting certain variables that are substituted at random space into the issue formulation [30]. This algorithm is a robust model assembling various weak classifiers [31,32]. It is analogous to other methods of decision trees such as bagging, which generates trees using different samples of series from the training dataset, and another method named random forest, which has ideas from bagging or the RSS model. The RSS model can easily be used with any other machine learning model, although decision trees are applied. Its performance varies notably with the choice of input variables. The original space is divided into subsets as part of the RSS algorithm’s first phase. Then, the results are attained by most polls using Equation (13):
β x = a r g m a x y 1 ,       1 δ s n g C b x , y
where δ is the Kronecker delta symbol,   C b x is the classification integration ( C = 1 ,   2 , ) , and y 1 ,   1 is a class label of the classifier. The graphic schematic diagram of the RSS algorithm for drought forecasting modeling is shown in Figure 4.

2.3.5. Random Forest (RF)

Breiman [33] proposed the RF algorithm, where the explanatory variables may be either continuous or categorical. The model has been used effectively for both regression and classification problems. It constructs several decision trees. The complexity of the non-linear relationship between the explanatory and target variables can easily be reduced instead of a detailed numerical representation because it causes more complexity in the model. The model is formed by selecting the input features randomly at each node. The regression in the model has a similar structure to the classification tree. The output is obtained by averaging each tree’s output (Figure 5). The model is efficient with internally multiple classes, robustness to the outliers in the model, accuracy in prediction, self-tuning ability, and deals with the large or small sample of the datasets to the other machine learning models.
The adaptive bagging algorithm was developed to reduce bias and improve the model’s classification and regression efficiency. Out-of-bag (OOB) estimates error, strength, and model correlation with a selection of random features. Generally, the out-of-bag estimates in the model consider the combination of one-third of classifiers. The error rate is reduced due to an increment in an assortment of classifiers in classification through the RF model. Bootstrap is a statistical resample approach used to evaluate the statistical parameters of the sample selected randomly. It is observed from the available dataset that some observations not considered in the sample, known as out-of-bag data, are used to evaluate the generalization error and elaborate on the importance of the variables. The rate of generalization error in regression is defined as per Equation (14):
MSE oob = 1 N i = 1 N y i f oob x i ²
where f oob x i = OOB is the prediction for ith observations. For the classification, the rate of generalization error helps to estimate the class-wise rate of error for each class in the model, which can be evaluated using the rate of OOB error, as per Equation (15):
E oob = 1 N i = 1 N y i f oob x i

3. Model Evaluation

To assess the performance of the ML algorithm, a comparison between the estimated/modeled values using ML methods and the observed values of the SPI was made at various timescales. Five performance metrics indices including correlation coefficient (CC), mean absolute error (MAE), root mean square error (RMSE), relative absolute error (RAE), and root relative squared error (RRSE) were used to quantitatively validate the models. The various metrics are calculated as depicted in Table 3.
Where; SPI P r e SPI O b s are ith predicted and observed SPI values, respectively, and SPI P r e ¯ and SPI O b s ¯ are predicted and observed average values of SPI, respectively. Besides, N represents the number of data sets.

4. Results and Discussion

4.1. Input Variables Selection

The meteorological drought index (SPI) was calculated across a range of timeframes (3–12 months). As shown in Figure 6, PACF was applied to determine the optimum lags of the SPI index [34]. The lag values that provide this 95% confidence bound were selected as the inputs. For all timescales, t-1 lag formed the largest correlation [16]. The SPI index provided a remarkably large correlation of t-10 lag for the 9-month timescale. Table 4 displayed the optimal input combination for SPI prediction used in this study. To examine the temporal gaps between the current and previous indices, the statistical approach attempted to extract lagging information from the signal. After that, the optimal inputs for each time lag were found by conducting statistical analysis on the lagged combination coefficients and by analyzing the correlations between them.

4.2. Comparison Models of SPI Drought Index

The SPI drought index is predicted in five sub-basins using five machine learning models: SVM, AR, B, RSS, and RF. For the training and testing stages, Table 5 displays the model outcomes across all timelines. Training (70%) and testing (30%) are the two phases into which the data was split. The results indicate that SPI-12 performed the best when compared with the other timeframes. Moreover, RF (CC = 0.960, MAE = 0.230, RMSE = 0.304, RAE = 29.056, and RRSE = 30.683) exhibited the best performance for SPI-3 compared with the other models for the training phase. According to CC, MAE, RMSE, RAE, and RRSE with SPI-12 during the testing phase, SVM was able to achieve 0.880, 0.283, 0.371, 38.061, and 41.520, respectively.
Figure 7 illustrates the scatterplot of the observed and predicted SPI for the training and testing phases. The results showed that the SPI for the proposed SVM model closely matched the observed values during the testing phase. Extreme wetness and drought values were seen to be more properly approximated using SVM.
Cross-validation is a statistical method that splits data into two segments: one for learning or training a model and the other for verifying the model. It was used in this study to analyze and compare machine learning algorithms. It is a resampling process used to assess machine learning models and determine how well the model would perform on an independent test dataset. In this study, four folds were used for all of the algorithms developed in order to cross-validate all of the datasets, as shown in Figure 8. All of the datasets were divided into four parts to evaluate the performance of models in terms of accuracy and statistical errors. Table 6 demonstrated that the SVM model outperformed other models in terms of performance. The correlation coefficients ranged from 0.67 to 0.91 under all of the SPI periods, except for SPI 12. RF was the best model for predicting SPI-12. At the same time, statistical errors were the lowest values compared with other algorithms (Table 6). In addition, time series and scatter plots are presented in Figure 9 to show the performance of the developed models.
As illustrated in Table 7, at sub-basin 2, the SVM model also gave higher outcomes for correlation coefficients (CC) and statistical errors, such as MAE, RMSE, RAE, and RRSE, than other data-driven models. The CC values varied from 0.642 to 885 for the SPI periods from 3 to 12 months. The time series and scatter plots for the SVM model were created to show its accuracy in SPI modeling under different months for the second basin (Figure 10). This study concludes and recommends using SVM for the studied stations for modeling the drought SPI index for better water resources management, monitoring, and planning.
This model was validated by comparing its performance across the results reported in the literature. The SVM model (CC = 0.674, MAE = 0.569, RMSE = 0.734, RAE = 71.44, and RRSE = 73.94) outperformed the ANFIS model trained using the nomadic people optimization algorithm (NPA) (RMSE = 2.21, MAE = 2.15, NSE = 0.91, PBIAS = 0.15, and R2 = 0.92) to forecast the 3-month SPI. The ANFIS-NPA model was reported to be the best-performing model compared with the ANFIS, RBFNN, MLP, and SVM models optimized using the krill algorithm, salp swarm algorithm, and bat algorithm, along with the standalone models [35]. This affirms the robustness and outperformance of the developed models compared with those found in the previous studies.

5. Conclusions and Recommendations

Drought has a detrimental influence on agricultural output, land and soil quality, desertification, food security, and water resources. Despite this, because of its complexity and several factors at various temporal and geographic dimensions, drought continues to be among the least understood natural phenomena. In the last ten years, the use of machine learning approaches to develop trustworthy models with high computational capabilities has drawn attention to the field of drought modelling. In this context, this research applied five machine learning models, namely support vector machine, additive regression, bagging, random subspace, and random forest, to anticipate a meteorological drought in the Wadi Mina basin, Algeria. Five performance assessment metrics were used to compare the performance of the developed models. The results indicated that SPI-12 performed the best when compared with the other timeframes. According to CC, MAE, RMSE, RAE, and RRSE with SPI-12 during the testing phase, SVM was able to achieve 0.880, 0.283, 0.371, 38.061, and 41.520, respectively. The results from cross-validation demonstrated that the SVM model outperformed the other models. The correlation coefficients ranged from 0.674 to 0.908 under all of the SPI periods. Its performance was validated at sub-basin 2 and satisfactory results were achieved. The suggested model provided a practical tool for managing intricate drought dynamics at various periods. Future studies should investigate the application of the proposed model in other basins of different countries. A trustworthy intelligent system might be developed using the suggested model to anticipate meteorological drought across a variety of timescales, aid in the management of sustainable water resources, and identify corrective actions in stations.

Author Contributions

Conceptualization, M.A. and A.E.; methodology, M.A., A.E. and D.K.V.; software, M.A. and A.E.; validation, M.A., A.E., N.E., M.J. and Q.B.P.; formal analysis, M.A. and N.E.; investigation, M.A.; resources, M.A.; data curation, M.A. and D.T.A.; writing—original draft preparation, M.A., A.E., N.E., M.J., D.K.V., E.M.A. and Q.B.P.; writing—review and editing, M.A., A.E., N.E., M.J., D.K.V., D.T.A., E.M.A. and Q.B.P.; visualization, D.T.A., M.A. and N.E.; supervision, M.A. and Q.B.P.; project administration, M.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

We thank the National Agency of the Water Resources (ANRH) for the collected data and the General Directorate of Scientific Research and Technological Development of Algeria (DGRSDT).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Dai, A. Increasing drought under global warming in observations and models. Nat. Clim. Chang. 2013, 3, 52–58. [Google Scholar] [CrossRef]
  2. Kim, T.W.; Jehanzaib, M. Drought risk analysis, forecasting and assessment under climate change. Water 2020, 12, 1862. [Google Scholar] [CrossRef]
  3. Zhao, L.; Lyu, A.; Wu, J.; Hayes, M.; Tang, Z.; He, B.; Liu, J.; Liu, M. Impact of meteorological drought on streamflow drought in Jinghe River Basin of China. Chin. Geogr. Sci. 2014, 24, 694–705. [Google Scholar] [CrossRef]
  4. Crutchfield, S. USDA Economic Research Service-US Drought 2012: Farm and Food Impacts. 2012. Available online: https://drought.unl.edu/archive/assessments/USDA-ERS-2012-farm-food-impacts.pdf (accessed on 8 December 2022).
  5. Mishra, A.K.; Singh, V.P. Drought modelling—A review. J. Hydrol. 2011, 403, 157–175. [Google Scholar] [CrossRef]
  6. Cancelliere, A.; Mauro, G.D.; Bonaccorso, B.; Rossi, G. Drought forecasting using the standardized precipitation index. Water Resour. Manag. 2007, 21, 801–819. [Google Scholar] [CrossRef]
  7. Heim, R.R., Jr. A review of twentieth-century drought indices used in the United States. Bull. Am. Meteorol. Soc. 2002, 83, 1149–1166. [Google Scholar] [CrossRef]
  8. Jehanzaib, M.; Shah, S.A.; Kim, J.E.; Kim, T.W. Exploring spatio-temporal variation of drought characteristics and propagation under climate change using multi-model ensemble projections. Nat. Hazards 2022, 1–21. [Google Scholar] [CrossRef]
  9. Jehanzaib, M.; Sattar, M.N.; Lee, J.H.; Kim, T.W. Investigating effect of climate change on drought propagation from meteorological to hydrological drought using multi-model ensemble projections. Stoch. Environ. Res. Risk Assess. 2020, 34, 7–21. [Google Scholar] [CrossRef]
  10. Zhao, J.; Xu, J.; Xie, X.; Lu, H. Drought monitoring based on TIGGE and distributed hydrological model in Huaihe River Basin, China. Sci. Total Environ. 2016, 553, 358–365. [Google Scholar] [CrossRef]
  11. Durbach, I.; Merven, B.; McCall, B. Expert elicitation of autocorrelated time series with application to e3 (energy-environment-economic) forecasting models. Environ. Model. Softw. 2017, 88, 93–105. [Google Scholar] [CrossRef]
  12. Jehanzaib, M.; Bilal Idrees, M.; Kim, D.; Kim, T.W. Comprehensive evaluation of machine learning techniques for hydrological drought forecasting. J. Irrig. Drain. Eng. 2021, 147, 04021022. [Google Scholar] [CrossRef]
  13. Anshuka, A.; van Ogtrop, F.F.; Willem Vervoort, R. Drought forecasting through statistical models using standardised precipitation index: A systematic review and meta-regression analysis. Nat. Hazards 2019, 97, 955–977. [Google Scholar] [CrossRef]
  14. Solomatine, D.P.; Ostfeld, A. Data-driven modelling: Some past experiences and new approaches. J. Hydroinformatics 2008, 10, 3–22. [Google Scholar] [CrossRef]
  15. Abrahart, R.J.; See, L.M.; Solomatine, D.P. (Eds.) Practical Hydroinformatics: Computational Intelligence and Technological Developments in Water Applications; Springer Science & Business Media: Berlin, Germany, 2008; Volume 68. [Google Scholar]
  16. Achite, M.; Jehanzaib, M.; Elshaboury, N.; Kim, T.W. Evaluation of machine learning techniques for hydrological drought modeling: A case study of the Wadi Ouahrane basin in Algeria. Water 2022, 14, 431. [Google Scholar] [CrossRef]
  17. Maca, P.; Pech, P. Forecasting SPEI and SPI drought indices using the integrated artificial neural networks. Comput. Intell. Neurosci. 2016, 2016, 14. [Google Scholar] [CrossRef]
  18. Mokhtarzad, M.; Eskandari, F.; Jamshidi Vanjani, N.; Arabasadi, A. Drought forecasting by ANN, ANFIS, and SVM and comparison of the models. Environ. Earth Sci. 2017, 76, 729. [Google Scholar] [CrossRef]
  19. Sattar, M.N.; Jehanzaib, M.; Kim, J.E.; Kwon, H.H.; Kim, T.W. Application of the hidden Markov bayesian classifier and propagation concept for probabilistic assessment of meteorological and hydrological droughts in South Korea. Atmosphere 2020, 11, 1000. [Google Scholar] [CrossRef]
  20. Adnan, R.M.; Mostafa, R.R.; Islam, A.R.M.T.; Gorgij, A.D.; Kuriqi, A.; Kisi, O. Improving drought modeling using hybrid random vector functional link methods. Water 2021, 13, 3379. [Google Scholar] [CrossRef]
  21. Achite, M.; Ouillon, S. Suspended sediment transport in a semiarid watershed, Wadi Abd, Algeria (1973–1995). J. Hydrol. 2007, 343, 187–202. [Google Scholar] [CrossRef]
  22. Awange, J.L.; Mpelasoka, F.; Goncalves, R.M. When every drop counts: Analysis of droughts in Brazil for the 1901-2013 period. Sci. Total Environ. 2016, 566, 1472–1488. [Google Scholar] [CrossRef]
  23. Sain, S.R. The nature of statistical learning theory. Technometrics 1996, 38, 409. [Google Scholar] [CrossRef]
  24. Kushwaha, N.L.; Rajput, J.; Elbeltagi, A.; Elnaggar, A.Y.; Sena, D.R.; Vishwakarma, D.K.; Mani, I.; Hussein, E.E. Data intelligence model and meta-heuristic algorithms-based pan evaporation modelling in two different agro-climatic zones: A case study from Northern India. Atmosphere 2021, 12, 1654. [Google Scholar] [CrossRef]
  25. Friedman, J.H. Stochastic gradient boosting. Comput. Stat. Data Anal. 2002, 38, 367–378. [Google Scholar] [CrossRef]
  26. Tan, Y.V.; Roy, J. Bayesian additive regression trees and the General BART model. Stat. Med. 2019, 38, 5048–5069. [Google Scholar] [CrossRef] [PubMed]
  27. Sparapani, R.; Logan, B.; Laud, P. MCW Biostatistics Technical Report 72 Nonparametric Failure Time: Time-to-event Machine Learning with Heteroskedastic Bayesian Additive Regression Trees and Low Information Omnibus Dirichlet Process Mixtures. 2021. Available online: https://www.mcw.edu/-/media/MCW/Departments/Biostatistics/tr72.pdf?la=en (accessed on 8 December 2022).
  28. Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef]
  29. Breiman, L. Arcing the Edge; Technical Report 486; Statistics Department, University of California at Berkeley: Berkeley, CA, USA, 1997. [Google Scholar]
  30. Vishwakarma, D.K.; Ali, R.; Bhat, S.A.; Elbeltagi, A.; Kushwaha, N.L.; Kumar, R.; Rajput, J.; Heddam, S.; Kuriqi, A. Pre-and post-dam river water temperature alteration prediction using advanced machine learning models. Environ. Sci. Pollut. Res. 2022, 29, 83321–83346. [Google Scholar] [CrossRef]
  31. Al-rimy, B.A.S.; Maarof, M.A.; Shaid, S.Z.M. Crypto-ransomware early detection model using novel incremental bagging with enhanced semi-random subspace selection. Future Gener. Comput. Syst. 2019, 101, 476–491. [Google Scholar] [CrossRef]
  32. Plumpton, C.O.; Kuncheva, L.I.; Oosterhof, N.N.; Johnston, S.J. Naive random subspace ensemble with linear classifiers for real-time classification of fMRI data. Pattern Recognit. 2012, 45, 2101–2108. [Google Scholar] [CrossRef]
  33. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  34. Deo, R.C.; Tiwari, M.K.; Adamowski, J.F.; Quilty, J.M. Forecasting effective drought index using a wavelet extreme learning machine (W-ELM) model. Stoch. Environ. Res. Risk Assess. 2017, 31, 1211–1240. [Google Scholar] [CrossRef]
  35. Mohamadi, S.; Sammen, S.S.; Panahi, F.; Ehteram, M.; Kisi, O.; Mosavi, A.; Ahmed, A.N.; El-Shafie, A.; Al-Ansari, N. Zoning map for drought prediction using integrated machine learning models with a nomadic people optimization algorithm. Nat. Hazards 2020, 104, 537–579. [Google Scholar] [CrossRef]
Figure 1. Study area, as well as pluviometric and hydrometric network.
Figure 1. Study area, as well as pluviometric and hydrometric network.
Water 15 00765 g001
Figure 2. Flowchart of the proposed methodology.
Figure 2. Flowchart of the proposed methodology.
Water 15 00765 g002
Figure 3. Graphic schematic layout of the support vector machine.
Figure 3. Graphic schematic layout of the support vector machine.
Water 15 00765 g003
Figure 4. The block diagram of the random subspace.
Figure 4. The block diagram of the random subspace.
Water 15 00765 g004
Figure 5. Illustration of random forest trees.
Figure 5. Illustration of random forest trees.
Water 15 00765 g005
Figure 6. Autocorrelation and partial autocorrelation function for SPI across different timeframes.
Figure 6. Autocorrelation and partial autocorrelation function for SPI across different timeframes.
Water 15 00765 g006
Figure 7. Plots showing predicted and observed SPI values for 3, 6, 9, and 12 timeframes during training and testing periods at sub-basin 1.
Figure 7. Plots showing predicted and observed SPI values for 3, 6, 9, and 12 timeframes during training and testing periods at sub-basin 1.
Water 15 00765 g007aWater 15 00765 g007b
Figure 8. Cross-validation layout for evaluation of the ML algorithms based on 4 k folds.
Figure 8. Cross-validation layout for evaluation of the ML algorithms based on 4 k folds.
Water 15 00765 g008
Figure 9. Plots showing the predicted and observed SPI values for 3-, 6-, 9-, and 12-month timeframes during the cross-validation periods at sub-basin 1.
Figure 9. Plots showing the predicted and observed SPI values for 3-, 6-, 9-, and 12-month timeframes during the cross-validation periods at sub-basin 1.
Water 15 00765 g009aWater 15 00765 g009b
Figure 10. Plots showing the predicted and observed SPI values for 3, 6, 9, and 12 month timeframes during validating the performance at sub-basin 2.
Figure 10. Plots showing the predicted and observed SPI values for 3, 6, 9, and 12 month timeframes during validating the performance at sub-basin 2.
Water 15 00765 g010aWater 15 00765 g010b
Table 1. Basic characteristics of rainfall stations.
Table 1. Basic characteristics of rainfall stations.
IDNameLongitudeLatitudeElevation (m)
S1013306Oued Abtal0°40′33.97″ E35°28′03.59″ N354
S2013401Sidi Abdelkader Djillali0°34′08.35″ E35°29′20.71″ N225
Table 2. Categorization of different states of SPI.
Table 2. Categorization of different states of SPI.
SPI ValuesDrought CategoryProbability (%)
Greater than or equal to 2.0Extremely wet2.3
Greater than or equal to 1.5 and less than 2.0Very wet4.4
Greater than or equal to 1.0 and less than 1.5Moderate wet9.2
Greater than or equal to −1.0 and less than 1.0Near normal68.2
Greater than or equal to −1.0 and less than −1.5Moderately dry9.2
Greater than or equal to −1.5 and less than −2.0Severely dry4.4
Less than or equal to −2.0Extremely dry2.3
Table 3. Descriptions of indices for data mining techniques supported by mathematical formulations.
Table 3. Descriptions of indices for data mining techniques supported by mathematical formulations.
Performance IndicesFormulaRangeIdeal LevelDescription
Correlation coefficient CC = i = 1 N SPI O b s SPI O b s ¯ SPI P r e SPI P r e ¯ i = 1 N SPI O b s SPI O b s ¯ 2 i = 1 N SPI P r e SPI P r e ¯ 2 (−1 to +1)+1Calculates how similar the observed value is to the expected value.
Mean absolute error MAE = 1 N i = 1 N SPI P r e SPI O b s (0 to ∞)0Analyzes the error size on an average.
Root mean square error RMSE = 1 N i = 1 N SPI P r e SPI O b s 2 (0 to ∞)0Indicates how observed values differ from estimated values.
Relative absolute error RAE = i = 1 N SPI P r e SPI O b s / i = 1 N SPI O b s ¯ SPI O b s (0 to ∞)0Conducts a performance evaluation of the machine learning algorithm.
Root relative squared error RRSE = i = 1 N SPI P r e SPI O b s 2 / i = 1 N SPI O b s ¯ SPI O b s 2 (0 to ∞)0In contrast to RMSE, the relative squared error (RSE) allows the comparison of models with errors expressed in various units.
Table 4. Input−output relationships for SPI prediction.
Table 4. Input−output relationships for SPI prediction.
Sub Basin NameInputs VariablesTarget Variable
Sub-basin 1SPI-3 (t-1); SPI-3 (t-2)SPI-3
SPI-6 (t-1); SPI-6 (t-2); SPI-6 (t-5); SPI-6 (t-7)SPI-6
SPI-9 (t-1)SPI-9
SPI-12 (t-1)SPI-12
Table 5. Evaluation metrics of the model outputs for various timelines during the training and testing phases.
Table 5. Evaluation metrics of the model outputs for various timelines during the training and testing phases.
ModelTraining PhaseTesting Phase
CCMAERMSERAERRSECCMAERMSERAERRSE
SPI-3
Support vector machine0.6840.5570.72570.22573.2470.7010.4530.57968.03871.463
Additive regression0.7110.5440.69668.63870.3660.6770.4750.59671.31373.552
Bagging0.7960.4660.60658.72261.1950.6450.4780.62271.72176.792
Random subspace0.6670.6080.76576.72477.2780.5420.5530.68483.01984.405
Random forest0.9600.2300.30429.05630.6830.5920.5260.68278.92684.088
SPI-6
Support vector machine0.8240.4470.59656.15756.7290.8110.3550.45258.50557.447
Additive regression0.8330.4420.58155.45455.3090.7700.4020.49266.21962.530
Bagging0.8640.3990.52950.11650.3960.8000.3600.46159.36158.638
Random subspace0.8640.4030.53050.56050.4340.8110.3590.45059.24357.137
Random forest0.9250.3030.40138.08338.1610.7350.4200.54569.18869.239
SPI-9
Support vector machine0.8820.3590.47246.28847.2420.8660.3060.38145.59946.795
Additive regression0.8780.3710.47947.82347.9530.8220.3620.44053.94053.993
Bagging0.9090.3210.41541.39841.6050.8450.3390.41450.57650.830
Random subspace0.8970.3360.44143.27244.1520.8630.3150.38846.93247.606
Random forest0.9470.2330.32030.03032.0410.8060.3570.46353.27556.859
SPI-12
Support vector machine0.9080.3050.43137.41241.9630.8800.2830.37138.06141.520
Additive regression0.8980.3410.45441.77244.2680.8230.3430.45346.13450.702
Bagging0.9270.2780.38634.13837.5960.8660.3050.39441.09244.043
Random subspace0.9240.2830.39434.72838.3860.8740.2970.38039.91442.553
Random forest0.9560.2120.30325.97729.5140.8080.3810.48351.22654.028
Table 6. Cross-validation for the ML models for estimating SPI at 3, 6, 9, and 12 months.
Table 6. Cross-validation for the ML models for estimating SPI at 3, 6, 9, and 12 months.
ModelCCMAERMSERAERRSE
SPI-3
Support vector machine0.6740.5690.73471.4473.94
Additive regression0.6450.6000.75875.383976.384
Bagging0.6380.6000.76475.32576.964
Random subspace0.5860.6420.81180.60381.653
Random forest0.6150.6110.79776.78880.255
SPI-6
Support vector machine0.8180.4550.60656.65457.369
Additive regression0.7650.5180.68364.60464.637
Bagging0.7990.4810.63459.94060.016
Random subspace0.7990.4810.63259.91559.885
Random forest0.7490.5650.72470.45768.512
SPI-9
Support vector machine0.8770.3680.48046.88347.479
Additive regression0.8400.4200.54353.61353.726
Bagging0.8560.3960.51750.52951.186
Random subspace0.8560.3950.51750.35551.152
Random forest0.8390.4340.55555.33954.907
SPI-12
Support vector machine0.9080.3050.43137.41241.963
Additive regression0.8980.3410.45441.77244.268
Bagging0.9270.2780.38634.13837.596
Random subspace0.9240.2830.39434.72838.386
Random forest0.9560.2120.30325.97729.514
Table 7. Performance validation of SVM at sub-basin 2.
Table 7. Performance validation of SVM at sub-basin 2.
CCMAERMSERAERRSE
SPI-30.6420.5750.75172.92777.370
SPI-60.8110.4450.58058.95858.548
SPI-90.8650.3790.49351.30450.172
SPI-120.8850.3310.46642.8146.639
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Achite, M.; Elshaboury, N.; Jehanzaib, M.; Vishwakarma, D.K.; Pham, Q.B.; Anh, D.T.; Abdelkader, E.M.; Elbeltagi, A. Performance of Machine Learning Techniques for Meteorological Drought Forecasting in the Wadi Mina Basin, Algeria. Water 2023, 15, 765. https://doi.org/10.3390/w15040765

AMA Style

Achite M, Elshaboury N, Jehanzaib M, Vishwakarma DK, Pham QB, Anh DT, Abdelkader EM, Elbeltagi A. Performance of Machine Learning Techniques for Meteorological Drought Forecasting in the Wadi Mina Basin, Algeria. Water. 2023; 15(4):765. https://doi.org/10.3390/w15040765

Chicago/Turabian Style

Achite, Mohammed, Nehal Elshaboury, Muhammad Jehanzaib, Dinesh Kumar Vishwakarma, Quoc Bao Pham, Duong Tran Anh, Eslam Mohammed Abdelkader, and Ahmed Elbeltagi. 2023. "Performance of Machine Learning Techniques for Meteorological Drought Forecasting in the Wadi Mina Basin, Algeria" Water 15, no. 4: 765. https://doi.org/10.3390/w15040765

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop