Next Article in Journal
A Joint Impact on Water Vapor Transport over South China during the Pre-Rainy Season by ENSO and PDO
Previous Article in Journal
Multiscale Interactions of Climate Variability and Rainfall in the Sogamoso River Basin: Implications for the 1998–2000 and 2010–2012 Multiyear La Niña Events
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Monthly Streamflow Prediction by Metaheuristic Regression Approaches Considering Satellite Precipitation Data

1
Faculty of Engineering, Kharazmi University, Tehran 15719-14911, Iran
2
Department of Civil Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore 641112, India
3
Department of Civil Engineering, Siddaganga Institute of Technology, Tumakuru 572103, India
4
Department of Civil Engineering, University of Applied Sciences, 23562 Lübeck, Germany
5
Department of Civil Engineering, Ilia State University, 0162 Tbilisi, Georgia
*
Authors to whom correspondence should be addressed.
Water 2022, 14(22), 3636; https://doi.org/10.3390/w14223636
Submission received: 2 October 2022 / Revised: 8 November 2022 / Accepted: 9 November 2022 / Published: 11 November 2022
(This article belongs to the Section Hydrology)

Abstract

:
In this study, the viability of three metaheuristic regression techniques, CatBoost (CB), random forest (RF) and extreme gradient tree boosting (XGBoost, XGB), is investigated for the prediction of monthly streamflow considering satellite precipitation data. Monthly streamflow data from three measuring stations in Turkey and satellite rainfall data derived from Tropical Rainfall Measuring Mission (TRMM) were used as inputs to the models to predict 1 month ahead streamflow. Such predictions are crucial for decision-making in water resource planning and management associated with water allocations, water market planning, restricting water supply and managing drought. The outcomes of the metaheuristic regression methods were compared with those of artificial neural networks (ANN) and nonlinear regression (NLR). The effect of the periodicity component was also investigated by importing the month number of the streamflow data as input. In the first part of the study, the streamflow at each station was predicted using CB, RF, XGB, ANN and NLR methods and considering TRMM data. In the second part, streamflow at the downstream station was predicted using data from upstream stations. In both parts, the CB and XGB methods generally provided similar accuracy and performed superior to the RF, ANN and NLR methods. It was observed that the use of TRMM rainfall data and the periodicity component considerably improved the efficiency of the metaheuristic regression methods in modeling (prediction) streamflow. The use of TRMM data as inputs improved the root mean square error (RMSE) of CB, RF and XGB by 36%, 31% and 24%, respectively, on average, while the corresponding values were 37%, 18% and 43% after introducing periodicity information into the model’s inputs.

Graphical Abstract

1. Introduction

Streamflow is one of the most important components of the terrestrial water cycle. It describes the flow of water that enters the watershed as precipitation, reaching its destination through natural drainage into lakes and oceans by flowing through creeks, streams, and rivers [1,2]. While high streamflow in a channel (stream or river) can cause flooding and waterlogging, low streamflow can adversely affect dependent riverine ecosystems [3,4]. In both cases, the consequences can be dire, resulting in severe socio-economic losses and ecosystem fragmentation [5,6]. Hence, it is necessary to accurately predict streamflow so that a significant portion of these damages can be effectively mitigated. Further, since surface water reservoirs are principally used to supply water to satisfy urban drinking water requirements, accurate streamflow forecasting is required for the efficient planning and management of these systems [7,8]. Several variables, including the climate and hydrology, the hydraulic properties of the stream, elevation, and the presence of upstream controls, affect streamflow. As the uncertainty and hydro-climatic load on a stream and river network increase, it becomes increasingly difficult to precisely forecast streamflow, making it an arduous task to reduce or control its vulnerability.
Broadly speaking, streamflow prediction models can be classified as either linear or non-linear [9,10]. Traditional linear regression models, such as autoregressive integrated moving average models and multiple linear regression techniques, have been used for streamflow forecasting [11,12]. These models can perform well in forecasting long lead time streamflow forecasts, but their performance is limited by the assumption of linearity of the streamflow [10]. Since streamflow time series are inherently non-linear owing to their stochastic nature and dependency on different external control (exogenous) variables [13], non-linear modeling techniques such as artificial neural networks (ANN), support vector regression (SVM), extreme learning machine (ELM), and tree-based regression techniques such as random and rotation forest (RF) have been widely applied in streamflow forecasting [10]. Most of these models are basically machine learning models that leverage time series data to make accurate predictions. These models are also being successfully used to forecast other natural and hydro-geo-climatic phenomena [14,15,16].
The development and use of ANN and ANN ensemble (hybrid) models in streamflow forecasting have been documented in several studies [17,18,19]. In several case studies, the application of ELM, SVM, and RF models to understand the potential of these models in streamflow forecasting revealed that each showed varying performance under different hydro-climatic and geographic conditions [19,20]. Though all the above-mentioned models performed decently in forecasting streamflow, obtaining a higher correlation coefficient (e.g., R2 above the range of ~0.8) between the actual and forecasted streamflow remains challenging [21]. To overcome this difficulty, generally, novel or hybrid ensemble models are being developed in combination with traditional non-linear streamflow forecasting models. Further, newly developed advanced metaheuristic techniques which are being used in other time-series forecasting applications (e.g., finance) from the machine learning domain are also occasionally being used in streamflow forecasting to determine if they can improve the prediction accuracy of streamflow forecast models [22,23]. Boosting algorithms are one such metaheuristic technique that has been recently applied to forecast streamflow [23,24,25].
Boosting algorithms like gradient tree boosting have been shown to forecast streamflow with high accuracy [23], and ensemble parallel tree boosting models like extreme gradient tree boosting (XGBoost) have also recently been used in related research [26,27]. Although very similar to RF in its structure and implementation, XGBoost varies by how the trees are developed and combined [28]. While RF is implemented by bagging where the final forecast is the average of all decision trees, XGBoost uses the error residuals from previous decision tree models to fit the subsequent models, and the final forecast is a weighted sum of all the tree forecasts. Similar to these models, a new model has been developed using gradient boosting on decision trees with categorical feature support, also referred to as CatBoost [29]. CatBoost has several advantages over traditional models, i.e., fast computing efficiency, native ability to handle categorical features, and the use of symmetric trees for swifter execution and to overcome overfitting by implementing ordered boosting [30]. Although CatBoost has not been applied in streamflow forecasting (to the best of the authors’ knowledge), recent studies have documented the superior ability of CatBoost models in forecasting other hydro-climatic variables including weather and evapotranspiration [31,32].
In this study, three tree-based metaheuristic regression approaches, namely XGBoost, RF, and CatBoost, were used to model streamflow at three different locations in the Black Sea region in Turkey. The performance of all the three models was analyzed using different performance and error indices, and the results of each of these models were compared with ANN and nonlinear regression (NLR). Additionally, satellite rainfall data derived from Tropical Rainfall Measuring Mission (TRMM) were also used as model inputs to study if providing additional hydro-climatic variables improved the accuracy of the models. This paper presents some important observations regarding the use of the XGBoost and CatBoost models in streamflow forecasting and discusses how the performance of these models could be improved further in streamflow forecasting applications.

2. Methods and Materials

Three metaheuristic regression approaches, i.e., CB, RF, XGB, along with ANN and NLR were implemented in the present study to predict monthly streamflow considering precipitation data from TRMM. The modeling procedure is illustrated in Figure 1. The CB, RF and XGB methods are briefly explained in the following sections.

2.1. CatBoost

CatBoost a recent open-source boosting (ensemble strategy) algorithm proposed by Yandex engineers (documentation of CatBoost model is available at https://catboost.ai/) (accessed on 30 October 2022) [30,31,32,33]. It stems from the concepts of decision trees and gradient boosting. With oblivious (‘Oblivious trees’ are grown symmetrically, using the same features for splitting and learning criterion across each level of the tree.). decision trees as base predictors, CatBoost is well-balanced, less prone to overfitting, and saves significant time during testing phase. Let us consider dataset Ȥ = ( a i . b i ) i = 1 , , n , where a i = ( a i 1 . .   a i m ) is a random vector of m features and b i is an output feature of either numerical or binary response. The data ( a i . b i )   are independent and follow some unknown N(∙,∙) distribution. A train function F : m that minimizes the expected loss ( F ) = E L ( b . F ( a ) ) is the ultimate objective of any learning model, where L denotes a smooth loss function [29,30]. A sequence of relatively closer approximations F t : m , t = 0,1,2…. is built iteratively in a greedy fashion using a gradient boosting procedure. Based on a generalized additive approach, Ft is derived from antecedent approximation Ft−1, such that F t = F t 1 + α h t , where ht is a base predictor function ( h t : m ) and α denotes the step size [30]. With the objective of minimizing the expected loss, the base predictor is usually opted from family of functions H:
h t = argmin h H ( F t 1 + h ) = argmin h H E ( y . F t 1 ( x ) + h ( x ) )
The minimization problem is usually solved by functional gradient descent, either by considering a (negative) gradient step or by using the Newton second-order approximation method [29,33]. Generally, least-squares approximation is used to solve for ht(x). However, in CatBoost (an implementation of gradient boosting), decision tree ‘h’ is obtained as:
h t h ( x ) = k = 1 K m k I { x D k }
h ( x ) = k = 1 K m k I { x D k } , where Dk are the disjointed regions that correspond to the tree’s leaves and mk denotes the leaf values of the obtained trees [33].
CatBoost makes improvements to the gradient boosting procedure and employs a more effective ‘Ordered Target Statistics’ strategy to learn the model, making use of all training data. Hence, CatBoost outperforms even for heterogeneous data situations. For further in-depth details and mathematical concepts of CatBoost, readers may refer to the following literature [29,30,33,34].

2.2. eXtreme Gradient Boosting (XGBoost)

One of the fundamental issues in tree learning is to discover the best split; hence, eXtreme Gradient Boosting (XGBoost) (an extension to gradient boosted decision trees) was created, achieving superior results [35]. XGBoost employs a greedy algorithm that initiates from a single leaf and iteratively augments branches to the tree to find the best split [36]. It is not possible to train several trees in parallel using XGboost, but it can generate distinct tree nodes in parallel. The distributed weighted quantile sketch algorithm included in XGBoost aids in determining the best split points and handle weighted datasets. The weights of individual tree can be scaled down by a constant, thus reducing the impact of a single tree on the final score. To penalize the highly complex model, XGBoost employs both Lasso and Ridge Regression regularization. Additionally, it comes with a built-in cross-validation method at each iteration to avoid over-fitting. XGBoost has the advantages of effective tree pruning, parallel processing, and regularization. Finally, features like shrinkage and column subsampling speed up the computations of the parallel algorithm. For further in-depth details and mathematical concepts of XGBoost, the reader may refer to the following literature [35,37,38]. Figure 2 illustrates the structure of the XGBoost model.

2.3. Random Forest (RF)

Random forest, an extension of the bagging method, is the most versatile supervised machine learning algorithm, wherein multiple individual decision trees are merged to form an ensemble [39]. The bagging approach selects a random sample of data from a training batch to generate several data samples and then train them independently. From ‘k’ number of data records, RF picks up ‘n’ number of data records to construct individual decision trees for each sample. One-third of the training sample is set aside as test data, referred to as the out-of-bag (oob) sample. Each decision tree provides an output and, based on majority voting or averaging, RF generates output for classification or regression tasks, respectively. Random forest allows for evaluating the importance of variables or their contribution to a model. When a variable is removed from a model, indices such as Gini importance and mean decrease in impurity (MDI) are commonly used to determine how much the model’s accuracy has dropped. The RF model, unlike Decision Trees, is more robust to training sample selection and noise. Since it takes the average of all approximations from individual trees, overfitting is not seen due to the cancelling out of biases. For further in-depth details, the reader may refer to the following literature [39,40,41,42].

2.4. Case Study

The present study uses monthly mean streamflow data from three stations, Durucasu (station no: 1413, latitude: 36.11 N, longitude: 40.74 E, altitude: 301 m), Sutluce (station no: 1414, latitude: 36.12 N, longitude: 40.43 E, altitude: 510 m) and Kale (station no: 1402, latitude: 36.51 N, longitude: 40.77 E, altitude: 190 m), situated in Black Sea Region (BSR) of Turkey (Figure 3). The utilized data comprised continuous values throughout the period of 1998–2007; there were no gaps in the data from any of the stations. The highest rainfall in Turkey is observed in this region (BSR). The eastern part of the BSR receives 2200 mm of annual rainfall. This region has a wet-humid climate with a yearly average relative humidity of 71% and average temperatures of 4 °C and 22 °C in winter and summer, respectively. Yearly average total rainfall is about 842 mm and most of this (19.4%) occurs in summer [43]. Streamflow data were obtained from Turkish State Water Works. Precipitation data were obtained from the Tropical Rainfall Measuring Mission (TRMM), which provides continues satellite data over the BSR region. Such data were previously tested by comparing land data, and a high level of accuracy was observed by the researchers [43,44,45,46,47]. Table 1 sums up the statistical characteristics of the streamflow data. As shown, the streamflow data have high skewness, ranging from 1.60 to 2.43. The ranges of the training data do not cover those of the testing data; this could cause difficulties in predicting streamflow beyond the extreme values provided to the model in the training stage. The TRMM provides monthly precipitation data on grid bases. We selected the closest grids to the streamflow stations. Thus, for the Durucasu Station, TRMM from grid points #1524 (latitude: 36.125 N, longitude: 40.625 E) and #1602 (latitude: 36.125 N, longitude: 40.875 E) was used. For the Sutluce Station, data from grid #1446 (latitude: 36.125 N, longitude: 40.375 E), and for the Kale Station, data from grid #1525 (latitude: 36.375 N, longitude: 40.625 E) and grid #1526 (latitude: 36.625 N, longitude: 40.625 E) were utilized. The TRMM was launched in 1997. It is a joint project developed by NASA and JAXA (the space agency of Japan). It uses both active and passive microwave instruments with a low inclination orbit (35°). Therefore, TRMM is the foremost satellite in the world for the study of precipitation, storms and climate processes in the tropics (https://gpm.nasa.gov/sites/default/files/document_files/TRMMSenRevProp_v1.2.pdf) (accessed on 30 October 2022).

2.5. Application and Evaluation of the Methods

Three metaheuristic regression approaches, i.e., CB, RF and XGB, were compared in predictions of monthly streamflow, considering precipitation data obtained from TRMM. First, lagged streamflow data from three stations, Durucasu, Sutluce and Kale, were used as inputs to the models. Then, a periodicity component, indicated by the month number (MN) of the output (streamflow, Q at time t, Qt), was included in the input combinations. Finally, the precipitation acquired from TRMM was added into the inputs to explore its impact on accuracy of the models. In order to assess the performance of the implemented methods, the following statistics were employed:
RMSE = 1 N ( Q o Q p ) 2 N
rRMSE = 100 ( RMSE Q o ¯ )
MAE = 1 N 1 N | ( Q o Q p ) |
E L , M = 1 1 N | Q o Q p | 1 N | Q o Q o ¯ | , E L , M 1
MAPE = 100 N i = 1 N | Q o Q p |
where RMSE is Root Mean Square Error, rRMSE is the relative RMSE, MAE is Mean Absolute Error, EL,M is the Legate and McCabe’s Index, MAPE is the mean or average of the absolute percentage errors [47], N is the quantity of datasets, Q o and Q p are observed and predicted streamflow and Q o ¯ denotes the observed mean value.

3. Application and Results

3.1. Predicting Monthly Streamflow of Durucasu Station

In Table 2, the accuracies of the metaheuristic regression methods are compared in predicting the monthly streamflow at the Durucasu Station for the test stage. In the table, S1 in parenthesis is Scenario 1 involving Qt−1, while S123 indicates the scenario of Qt−1, Qt−2, Qt−3. The corresponding scenarios with periodicity are shown by S1M or S123M, respectively. The periodicity component is the month number of the streamflow output, varying from 1 (January) to 12 (December). Therefore, we used the abbreviation M for the scenarios involving periodicity information, while S1P refers the S1 with TRMM precipitation data. A comparison of the methods without considering TRMM data revealed that periodicity input improves the model accuracy in monthly streamflow prediction. For example, the improvement in the RMSE of the CB, RF and XGB for the first scenario (S1) was by 14.2%, 4%, 7% and 26.4%, respectively. The percentages were calculated using relative error (RE) (RE = (Value 1 − Value 2)∗100/Value 1). The CB and XGB almost had the same accuracy and they performed than RF with respect to all evaluation statistics. The right part of the Table 2 clearly shows that considering TRMM precipitation considerably improved the efficiency of the implemented methods in predicting monthly streamflow. Adding precipitation input increased the accuracy of CB with S1 input by 24%, 23%, 29% and 45% compared to RMSE, rRMSE, MAE and EL,M, respectively. This improved the corresponding statistics by 35%, 36%, 34% and 49% for the RF(S1) and 20%, 21%, 20% and 36% for the XGB(S1) models, respectively. Similar to the discharge-based models, here, the periodicity also considerably improved the model efficiency. For example, improvements in RMSE, rRMSE, MAE and EL,M of 29, 28, 25 and 17% were observed for the CB with inputs of S1 and TRMM precipitation, of 3.7%, 2.7%, 12.2% and 8.2% for the RF with the same inputs, and of 39, 38, 38 and 39% for the XGB with the same inputs. Among the metaheuristic regression models, XGB with S1, periodicity and TRMM precipitation as inputs had the best accuracy, with the lowest RMSE (12.33 m3/s), rRMSE (0.31) and MAE (8.77 m3/s) and the highest EL,M (0.68) in predicting monthly streamflow. It was followed by the CB model with the same inputs. The equation of the NLR is:
Q 1413 = 1.34 ( MN 0.33 + P 1.17 + Q t 1 0.554 )
where Q 1413 is the current streamflow at the Durucasu Station (Code: 1413), MN is the Month number, P is the TRMM Precipitation current month and   Q t 1 0.554 is the streamflow of one month prior.
Figure 4a compares the prediction accuracies of the metaheuristic regression models together with ANN and NLR in predictions of monthly streamflow at the Durucasu Station using the Taylor diagram. From such a diagram, we can compare the standard deviation (STD), RMSE and correlation of the model predictions. The XGB model with inputs of S1, periodicity and TRMM precipitation had a closer STD to the measured one than to other methods, although this model was closely followed by CB with the same inputs.
Figure 5a,b compares the performance of three metaheuristic regression models with/without TRMM data for the first input scenario and involving periodicity (S1M).
From the two scatterplots, we can see that the use of TRMM data as input considerably improves the efficiency of all three methods, yielding less scattered predictions. Both the XGB model with inputs of S1 and periodicity and the XGB model with inputs of S1, periodicity and TRMM precipitation function better than the other models in predicting monthly the streamflow at the Durucasu Station. Figure 6a,b, compares the time variation of the model predictions in two cases (with/without TRMM data). As shown, getting precipitation information from TRMM data improves model performance in all ranges (low, mean and maximum flows).

3.2. Predicting Monthly Streamflow at Sutluce Station

Table 3 sums up the test results of the three metaheuristic regression methods in predicting monthly streamflow at the Sutluce Station. The left part of the table (without TRMM data) reveals that involving periodicity in the model inputs considerably improves performance; for example, the improvement in RMSE of the CB, RF and XGB models with S1 input was by 52, 20 and 44%, respectively. The CB model with periodicity and two lagged streamflow data as inputs (Qt−1, Qt−2, MN) performed better than the other models, with the lowest RMSE (3.37 m3/s), rRMSE (0.24), and MAE (2.58 m3/s) and the highest EL,M (0.56) in the test stage. From the right part of Table 3, it can be observed that including TRMM precipitation data improved the model efficiency in both cases, with and without periodicity. Importing P data to the model input improved the RMSE, rRMSE, MAE and EL,M by 56, 56, 40 and 68% for the CB model with S1 input, by 29, 28, 27 and 30% for the RF model with the same input and by 13.5, 14.3, 4.9 and 25% for the XGB model with the same input, respectively. For the CB model, having periodicity, TRMM precipitation and one lagged streamflow data as inputs (Qt−1, MN, P) offered better performance than the other models. Similar to the previous station, here, an improvement was seen when periodicity was used as input; increases in RMSE, rRMSE, MAE and EL,M were by 38, 39, 42 and 70% for the CB model with inputs of S1 and TRMM precipitation, by 20, 19, 18 and 3719% for the RF model with the same input and by 47, 47, 44 and 231% for the XGB model with the same input, respectively. The equation of the NLR is:
Q 1414 = 0.98 ( MN 0.28 + P 0.096 + Q t 1 0.84 )
where Q 1414 is the current streamflow at the Sutluce Station (Code: 1414).
The Taylor diagram provided in Figure 4b shows that the CB model with inputs of S1, periodicity and TRMM precipitation had lower RMSE, higher correlation and closer STD to the measured one than the other models. From the scatterplots in Figure 5c,d, it is clear that the CB with inputs of S1, periodicity and TRMM precipitation had less scattered predictions. Additionally, it is clear that the use of TRMM data considerably improved the accuracy of the models. As shown from time variation graphs in Figure 6c,d, the models utilizing TRMM data could follow the measured streamflow much more closely than the discharge-based models.

3.3. Predicting Monthly Streamflow at the Kale Station

A comparison of metaheuristic regression methods in predicting monthly streamflow at the Kale Station is made in Table 4 for the test stage. It is apparent from the table (see the left part, without TRMM data) that considering periodicity in the input considerably improved the efficiency of the various methods; for example, improvements in RMSE, rRMSE, MAE and EL,M were by 31, 30, 32 and 158% for the CB model with S1 input, by 26, 25 28 and 174% for the RF model with the same input and by 43, 43, 40 and 133% for the XGB model with the same input, respectively. Among the three metaheuristic regression methods, the XGB model with two lagged streamflow data and periodicity (Qt−1, Qt−2, MN) as inputs offered the best accuracy, with the lowest RMSE (44 m3/s), rRMSE (0.42) and MAE (30.03 m3/s) and the highest EL,M (0.29) in the test stage. It is apparent from the second part of the Table 4 (see the right part, with TRMM data) that the use of P data acquired from TRMM considerably improved the accuracy both with and without periodicity. For example, it improved the RMSE, rRMSE, MAE and EL,M by 29, 29, 100 and 6373% for the CB model with S1 input, by 29, 28 100 and 8968% for the RF model with the same input and by 39, 40, 100 and 3977% for the XGB model with the same input, respectively. The equation of the NLR is:
Q 1402 = 3.24 ( MN 0.06 + P 1.19 + Q t 1 0.68 )
where Q 1402 is the current streamflow at the Kale Station (Code: 1402).
Among the three metaheuristic regression methods utilizing TRMM data as inputs, the XGB(S1MP) and CB(S1MP) with one lagged streamflow, TRMM precipitation and periodicity (Qt−1, P, MN) as inputs had almost the same accuracy, with both performing better than the RF model in the test stage. By using periodicity information, considerable improvements were observed; for example, improvements in the RMSE, rRMSE, MAE and EL,M were by 43, 43, 32 and 102% for the CB model with inputs of S1 and TRMM precipitation by 30, 30, 21.55 and 102% for the RF model with the same input and by 42, 42, 37 and 102% for the XGB with the same input, respectively. By importing periodicity information, considerable improvements were observed; for example, improvements in the RMSE, rRMSE, MAE and EL,M were by 43, 43, 32 and 102% for the CB inputs of S1 and TRMM precipitation, by 30, 30, 21.55 and 102% for the RF with the same input and by 42, 42, 37 and 102% for the XGB with the same input, respectively.
Figure 3 provides a Taylor diagram comparing the three methods with respect to correlation coefficient(R), RMSE and STD. The XGB model, with inputs of S1, periodicity and TRMM precipitation had a closer STD to the measured one than the others, closely followed by the CB model with the same input. As observed from Figure 4c,d, the use of TRMM data considerably improved the accuracy of all models. We can also see this improvement in time variation graphs provided in Figure 5e,f.

3.4. Predicting Monthly Streamflow at the Kale Station Using Upstream Data

Predicting monthly streamflow using data from upstream stations is essential. In some cases, data are missing from some stations because of technical problems, especially in the developing countries like Turkey. In this section, three metaheuristic regression methods are employed to find an efficient prediction model. Monthly streamflow data from the Kale Station were predicted using data of two upstream stations, Durucasu and Sutluce. Here, periodicity and TRMM data were also considered. Table 5 compares the accuracy of three methods with respect to some evaluation criteria utilized in the previous applications. It is clear from the table that in both cases (with/without TRMM data), considering periodicity generally improved the accuracy; for example, the RMSE decreased from 33.51 m3/s to 30.48 m3/s for the CB(S1314) model, from 32.60 m3/s to 28.04 m3/s for the RF(S1314) model and from 48.59 m3/s to 38.63 m3/s for the XGB(S1314) model. In both cases (with/without periodicity), adding TRMM data improved the efficiency; for example, a decrease was observed in RMSE from 30.48 m3/s to 25.79 m3/s for the CB(S1314M) model, from 32.60 m3/s to 27.78 m3/s for the RF(S1314) model and from 48.59 m3/s to 28.22 m3/s for the XGB(S1314) model. Among the implemented models, CB(S1314MP) produced the best streamflow predictions, with the lowest RMSE (25.79 m3/s) rRMSE (0.25), MAE (20.39 m3/s) and the highest EL,M (0.52). The equation of the NLR is:
Q 1402 = 6.53 ( MN 0.85 + P 0.3 + Q 1414 0.004 + Q 1413 0.92 )
where Q 1402 is the current streamflow at the Kale Station (Code: 1402), Q 1414 is the current streamflow at Sutluce Station and Q 1413 is the current streamflow at the Durucasu Station.
The prediction results of the three metaheuristic regression methods are illustrated in Figure 4d in a Taylor diagram. As shown in the diagram, the CB(S1314MP) model achieved better accuracy with closer STD to the measured one and lower RMSE and higher correlation than the other methods. In this regard, this model was closely followed by the XG(S1314MP) model. From the scatter plots in Figure 5g,h and time variation graphs in Figure 6g,h, the performance improvement by using TRMM data with the implemented models is clearly seen.

4. Discussion

In the presented study, three metaheuristic regression methods, CB, RF and XGB, were implemented for monthly streamflow predictions. These approaches were then compared with ANN and NLR methods. The applicability of TRMM precipitation data as inputs to the aforementioned models was investigated by considering different input scenarios comprising lagged streamflow as inputs as well as periodicity information (month number). The overall results indicate that considering TRMM precipitation data as inputs to the metaheuristic regression methods considerably improved their accuracy in monthly streamflow predictions, i.e., improvements in RMSE and MAE of the CB models having one lagged streamflow as input were by 24 and 29% for the Durucasu Station, by 56 and 40% for the Sutluce Station and by 29 and 100% for the Kale Station. This implies that such data are very useful in complex monthly streamflow predictions, especially in developing countries, where precipitation measurements are not available or may be missing altogether for technical reasons. These results are in agreement with the literature [45,46,47]. The accuracy of TRMM precipitation data was assessed in [45]. The authors of that report compared the monthly TRMM precipitation data covering the period of 1998–2010 with rain gauges from 16 meteorological stations in the Yarlung Zangbo River Basin and reported that there was a strong correlation and little numerical biases between TRMM precipitation data and rain gauges. By comparing its precipitation data with the 149 rainfall stations in Tunisia for a 16-year period (1998–2013), the performance of TRMM was assessed [46]. The authors found strong correlation between them.
It is observed from the results that adding a periodicity component to the inputs of the models improved their prediction accuracy, both with and without TRMM data. Improvements in the RMSE and MAE of the CB(S1) models were by 14 and 17% for the Durucasu Station, by 52 and 44% for the Sutluce Station and by 31 and 32% for the Kale Station. Similar observations were reported in a previous study [48]. Monthly streamflow of a mountainous basin using machine learning methods (e.g., MARS, GMDH) was predicted in a previous study [48]. The authors of that paper reported that the use of periodicity information in the model inputs generally improved the accuracy of their predictions.
The results of streamflow predictions using upstream data revealed that such data can provide more information than local data. A comparison of Table 4 and Table 5 shows that the use of upstream data (from the Durucasu and Sutluce stations) without local station data (i.e., the Kale Station) considerably improved the model efficiency with respect to RMSE, rRMSE, MAE and EL,M. Improvements in the RMSE and rRMSE of the CB model with one lagged streamflow as input and without TRMM data were 53 and 54%, while the corresponding percentages were 40 and 39% for the same model with TRMM data. These are very useful findings for predictions of monthly streamflow, especially in the basins where limited measurements are made.
From a comparison of the tables, it may be seen that the addition of more lagged streamflow as inputs to the implemented models reduced their accuracy. These results are in direct agreement with those of previous studies [49,50]. According to the reports provided by the abovementioned references, increasing input quantity does not guarantee better predictions and, in some cases, it may negatively affect variance. In other words, increasing input quantity may create a more complex model with poor prediction accuracy.
Three neural network methods, i.e., feed forward neural networks (FFNN), generalized regression neural networks (GRNN) and radial basis function (RBF) were used to predict monthly streamflow at two stations, Gerdelli and Isakoy, in Turkey [51]. The GRNN provided the best accuracy with the lowest RMSE of 9.25 and 14.2 m3/s for the Gerdelli (Qmean = 12.29 m3/s) and Isakoy (Qmean = 17.86 m3/s) stations, respectively. Two neuro-fuzzy methods were applied for predictions of monthly streamflow at two additional stations, Besiri and Baykan, in Turkey; the best results were obtained from the subclustering-based neuro-fuzzy metho, which obtained an RMSE of 32.8 and 9.36 m3/s for the Besiri (Qmean = 51.82 m3/s) and Baykan (Qmean = 21.36 m3/s) stations, respectively [52]. In the present study, the CB, as the best method, produced an RMSE of 12.33, 3.1 and 29.11 m3/s for the Durucasu (Qmean = 40.1 m3/s), Sutluce (Qmean = 14.1 m3/s) and Kale (Qmean = 104.2 m3/s) stations. This proves the accuracy of the implemented metaheuristic regression method (CB) in monthly streamflow predictions. In addition, the CB method has a simpler structure than the FFNN, GRNN, RBF and neuro-fuzzy methods.

5. Conclusions

In this study, the viability of three metaheuristic regression methods was investigated for monthly streamflow predictions using streamflow data from three stations in Turkey and satellite precipitation data from TRMM. The results were also compared with those obtained using the ANN and NLR models. The outcomes revealed that satellite data are very useful for monthly streamflow predictions, considerably improving the ability of metaheuristic regression methods (e.g., CB, RF and XGB). Our assessment of the methods with respect to statistical measures (e.g., RMSE, rRMSE, MAE, EL, M and CA) and visual inspections (Taylor diagrams, scatterplots and hydrographs) revealed that the CB method generally performed better than the XGB, RF, ANN and NLR methods. Additional improvement was observed by introducing periodicity information to the models. This input, involving month number, is very easy to employ and its usage is highly recommended for engineers and scholars. Including TRMM precipitation as input considerably improved the accuracy of implemented methods. This satellite data is highly accurate and its usage in streamflow predictions is strongly recommended by the authors. Monthly streamflow at the downstream station was successfully predicted by the CB and XGB methods using upstream data. In this application, the use of TRMM precipitation information and a periodicity component provided additional accuracy to the implemented models. These findings may provide useful information for managers and decision makers, especially in developing countries, where precipitation data are missing or absent altogether because of technical issues.

Author Contributions

M.M., methodology, investigation, formal analysis; A.M., formal analysis, writing—review and editing; S.R.N., methodology, resources, writing—review and editing; C.K., writing—review and editing; O.K., methodology, review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available upon request from the corresponding author.

Acknowledgments

The authors would like to thank the staff of the Turkish State Water Works who provided the streamflow data.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Edwards, P.J.; Williard, K.W.; Schoonover, J.E. Fundamentals of watershed hydrology. J. Contemp. Water Res. Educ. 2015, 154, 3–20. [Google Scholar] [CrossRef]
  2. Davie, T. Fundamentals of Hydrology, 2nd ed.; Routledge: London, UK, 2019. [Google Scholar]
  3. Chegwidden, O.S.; Rupp, D.E.; Nijssen, B. Climate change alters flood magnitudes and mechanisms in climatically-diverse headwaters across the northwestern United States. Environ. Res. Lett. 2020, 15, 094048. [Google Scholar] [CrossRef]
  4. Goeking, S.A.; Tarboton, D.G. Forests and water yield: A synthesis of disturbance effects on streamflow and snowpack in western coniferous forests. J. For. 2020, 118, 172–192. [Google Scholar] [CrossRef] [Green Version]
  5. Naz, B.S.; Kao, S.C.; Ashfaq, M.; Gao, H.; Rastogi, D.; Gangrade, S. Effects of climate change on streamflow extremes and implications for reservoir inflow in the United States. J. Hydrol. 2019, 556, 359–370. [Google Scholar] [CrossRef]
  6. Valenzuela-Aguayo, F.; McCracken, G.R.; Manosalva, A.; Habit, E.; Ruzzante, D.E. Human-induced habitat fragmentation effects on connectivity, diversity, and population persistence of an endemic fish, Percilia irwini, in the Biobío River basin (Chile). Evol. Appl. 2020, 13, 794–807. [Google Scholar] [CrossRef] [Green Version]
  7. Allen, G.H.; Pavelsky, T.M. Global extent of rivers and streams. Science 2018, 361, 585–588. [Google Scholar] [CrossRef] [Green Version]
  8. Lu, S.; Dai, W.; Tang, Y.; Guo, M. A review of the impact of hydropower reservoirs on global climate change. Sci. Total Environ. 2020, 711, 134996. [Google Scholar] [CrossRef]
  9. Marques, C.A.F.; Ferreira, J.A.; Rocha, A.; Castanheira, J.M.; Melo-Gonçalves, P.; Vaz, N.; Dias, J.M. Singular spectrum analysis and forecasting of hydrological time series. Phys. Chem. Earth Parts A/B/C 2006, 31, 1172–1179. [Google Scholar] [CrossRef]
  10. Karran, D.J.; Morin, E.; Adamowski, J. Multi-step streamflow forecasting using data-driven non-linear methods in contrasting climate regimes. J. Hydroinform. 2014, 16, 671–689. [Google Scholar] [CrossRef] [Green Version]
  11. Wang, Z.Y.; Qiu, J.; Li, F.F. Hybrid models combining EMD/EEMD and ARIMA for Long-term streamflow forecasting. Water 2018, 10, 853. [Google Scholar] [CrossRef]
  12. Zhang, Z.; Zhang, Q.; Singh, V.P. Univariate streamflow forecasting using commonly used data-driven models: Literature review and case study. Hydrol. Sci. J. 2018, 63, 1091–1111. [Google Scholar] [CrossRef]
  13. Fatichi, S.; Rimkus, S.; Burlando, P.; Bordoy, R. Does internal climate variability overwhelm climate change signals in streamflow? The upper Po and Rhone basin case studies. Sci. Total Environ. 2014, 493, 1171–1182. [Google Scholar] [CrossRef] [PubMed]
  14. Adombi, A.V.D.P.; Chesnaux, R.; Boucher, M.A. Theory-guided machine learning applied to hydrogeology—State of the art, opportunities and future challenges. Hydrogeol. J. 2021, 29, 2671–2683. [Google Scholar] [CrossRef]
  15. Najafzadeh, M.; Oliveto, G. Riprap incipient motion for overtopping flows with machine learning models. J. Hydroinform. 2020, 22, 749–767. [Google Scholar] [CrossRef]
  16. Kisi, O.; Mirboluki, A.; Naganna, S.R.; Malik, A.; Kuriqi, A.; Mehraein, M. Comparative evaluation of deep learning and machine learning in modelling pan evaporation using limited inputs. Hydrol. Sci. J. 2022, 67, 1–19. [Google Scholar] [CrossRef]
  17. Wang, W.; Van Gelder, P.H.; Vrijling, J.K.; Ma, J. Forecasting daily streamflow using hybrid ANN models. J. Hydrol. 2006, 324, 383–399. [Google Scholar] [CrossRef]
  18. Wu, C.L.; Chau, K.W. Data-driven models for monthly streamflow time series prediction. Eng. Appl. Artif. Intell. 2010, 23, 1350–1367. [Google Scholar] [CrossRef] [Green Version]
  19. Freire, P.K.D.M.M.; Santos, C.A.G.; da Silva, G.B.L. Analysis of the use of discrete wavelet transforms coupled with ANN for short-term streamflow forecasting. Appl. Soft Comput. 2019, 80, 494–505. [Google Scholar] [CrossRef]
  20. Li, S.; Zhang, L.; Du, Y.; Zhuang, Y.; Yan, C. Anthropogenic impacts on streamflow-compensated climate change effect in the Hanjiang River Basin, China. J. Hydrol. Eng. 2020, 25, 04019058. [Google Scholar] [CrossRef]
  21. Malik, A.; Tikhamarine, Y.; Souag-Gamane, D.; Kisi, O.; Pham, Q.B. Support vector regression optimized by meta-heuristic algorithms for daily streamflow prediction. Stoch. Environ. Res. Risk Assess. 2020, 34, 1755–1773. [Google Scholar] [CrossRef]
  22. Wang, L.; Li, X.; Ma, C.; Bai, Y. Improving the prediction accuracy of monthly streamflow using a data-driven model based on a double-processing strategy. J. Hydrol. 2019, 573, 733–745. [Google Scholar] [CrossRef]
  23. Ren, K.; Wang, X.; Shi, X.; Qu, J.; Fang, W. Examination and comparison of binary metaheuristic wrapper-based input variable selection for local and global climate information-driven one-step monthly streamflow forecasting. J. Hydrol. 2021, 597, 126152. [Google Scholar] [CrossRef]
  24. Zhang, H.; Yang, Q.; Shao, J.; Wang, G. Dynamic streamflow simulation via online gradient-boosted regression tree. J. Hydrol. Eng. 2019, 24, 04019041. [Google Scholar] [CrossRef]
  25. Rice, J.S.; Emanuel, R.E.; Vose, J.M.; Nelson, S.A. Continental US streamflow trends from 1940 to 2009 and their relationships with watershed spatial characteristics. Water Resour. Res. 2015, 51, 6262–6275. [Google Scholar] [CrossRef]
  26. Tyralis, H.; Papacharalampous, G.; Langousis, A. Super ensemble learning for daily streamflow forecasting: Large-scale demonstration and comparison with multiple machine learning algorithms. Neural Comput. Appl. 2021, 33, 3053–3068. [Google Scholar] [CrossRef]
  27. Ni, L.; Wang, D.; Wu, J.; Wang, Y.; Tao, Y.; Zhang, J.; Liu, J. Streamflow forecasting using extreme gradient boosting model coupled with Gaussian mixture model. J. Hydrol. 2020, 586, 124901. [Google Scholar] [CrossRef]
  28. Sahour, H.; Gholami, V.; Torkaman, J.; Vazifedan, M.; Saeedi, S. Random forest and extreme gradient boosting algorithms for streamflow modeling using vessel features and tree-rings. Environ. Earth Sci. 2021, 80, 1–14. [Google Scholar] [CrossRef]
  29. Zhang, D.; Qian, L.; Mao, B.; Huang, C.; Huang, B.; Si, Y. A data-driven design for fault detection of wind turbines using random forests and XGboost. IEEE Access 2018, 6, 21020–21031. [Google Scholar] [CrossRef]
  30. Dorogush, A.V.; Ershov, V.; Gulin, A. CatBoost: Gradient boosting with categorical features support. arXiv 2018, arXiv:1810.11363. [Google Scholar]
  31. Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased boosting with categorical features. In Proceedings of the 32nd Conference on Neural Information Processing Systems, Montréal, QC, Canada, 3–8 December 2018. [Google Scholar]
  32. Niu, D.; Diao, L.; Zang, Z.; Che, H.; Zhang, T.; Chen, X. A machine-learning approach combining wavelet packet denoising with Catboost for weather forecasting. Atmosphere 2021, 12, 1618. [Google Scholar] [CrossRef]
  33. Huang, G.; Wu, L.; Ma, X.; Zhang, W.; Fan, J.; Yu, X.; Zhou, H. Evaluation of CatBoost method for prediction of reference evapotranspiration in humid regions. J. Hydrol. 2019, 574, 1029–1041. [Google Scholar] [CrossRef]
  34. CatBoost. Available online: https://catboost.ai/ (accessed on 30 October 2022).
  35. Hancock, J.T.; Khoshgoftaar, T.M. CatBoost for big data: An interdisciplinary review. J. Big Data 2020, 7, 1–45. [Google Scholar] [CrossRef] [PubMed]
  36. Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016. [Google Scholar]
  37. Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
  38. Brownlee, J. XGBoost with Python: Gradient Boosted Trees with XGBoost and Scikit-Learn; Association for Computing Machinery: New York, NY, USA, 2016. [Google Scholar]
  39. Chen, R.C.; Caraka, R.E.; Arnita, N.E.G.; Pomalingo, S.; Rachman, A.; Toharudin, T.; Pardamean, B. An end to end of scalable tree boosting system. Sylwan 2020, 164, 1–11. [Google Scholar]
  40. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
  41. Pavlov, Y.L. Random Forests; De Gruyter: Boston, MA, USA, 2000; Available online: https://www.degruyter.com/document/doi/10.1515/9783110941975/html (accessed on 30 October 2022).
  42. Louppe, G. Understanding Random Forests: From Theory to Practice. Ph.D Thesis, University of Liège, Liège, Belgium, 2014. [Google Scholar]
  43. Scornet, E. Learning with Random Forests. Ph.D. Thesis, Université Pierre et Marie Curie, Paris, France, 2015. [Google Scholar]
  44. Sensoy, S.; Demircan, M.; Ulupinar, Y.; Balta, Z. Climate of Turkey. Available online: https://mgm.gov.tr/FILES/genel/makale/31_climateofturkey.pdf (accessed on 30 September 2022).
  45. Yang, U.; Sheng-tian, Y.; Ming-yong, C.; Qiu-wen, Z.; Guo-tao, D. The Applicability Analysis of TRMM Precipitation Data in the Yarlung Zangbo River Basin. J. Nat. Resour. 2013, 28, 1414–1425. [Google Scholar]
  46. Santos, C.A.G.; Brasil Neto, R.M.; da Silva, R.M.; Passos, J.S.D.A. Integrated spatiotemporal trends using TRMM 3B42 data for the Upper São Francisco River basin, Brazil. Environ. Monit. Assess 2018, 190, 175. [Google Scholar] [CrossRef]
  47. Medhioub, E.; Bouaziz, M.; Achour, H.; Bouaziz, S. Monthly assessment of TRMM 3B43 rainfall data with high-density gauge stations over Tunisia. Arab. J. Geosci. 2019, 12, 15. [Google Scholar] [CrossRef]
  48. Adnan, R.M.; Liang, Z.; Parmar, K.S.; Soni, K.; Kisi, O. Modeling monthly streamflow in mountainous basin by MARS, GMDHNN and DENFIS using hydroclimatic data. Neural Comput. Appl. 2021, 33, 2853–2871. [Google Scholar] [CrossRef]
  49. Shi, J.; Guo, J.; Zheng, S. Evaluation of hybrid forecasting approaches for wind speed and power generation time series Renewable and Sustainable. Energy Rev. 2012, 16, 3471–3480. [Google Scholar]
  50. Zhang, D.; Peng, X.; Pan, K.; Liu, Y. A novel wind speed forecasting based on hybrid decomposition and online sequential outlier robust extreme learning machine. Energy Convers. Manag. 2019, 180, 338–357. [Google Scholar] [CrossRef]
  51. Kisi, O. River flow forecasting and estimation using different artificial neural network techniques. Hydrol. Res. 2008, 39, 27–40. [Google Scholar] [CrossRef]
  52. Sanikhani, H.; Kisi, O. River flow estimation and forecasting by using two different adaptive neuro-fuzzy approaches. Water Resour Manag. 2012, 26, 1715–1729. [Google Scholar] [CrossRef]
Figure 1. The modeling procedure of the metaheuristic regression methods implemented in this study.
Figure 1. The modeling procedure of the metaheuristic regression methods implemented in this study.
Water 14 03636 g001
Figure 2. Structure of the XGBoost model.
Figure 2. Structure of the XGBoost model.
Water 14 03636 g002
Figure 3. The location of the Durucasu (1413), Sutluce (1414) and Kale (1402), Yesilirmak Basin stations, situated in the Black Sea Region.
Figure 3. The location of the Durucasu (1413), Sutluce (1414) and Kale (1402), Yesilirmak Basin stations, situated in the Black Sea Region.
Water 14 03636 g003
Figure 4. Taylor diagrams for the testing phase: (a) Durucasu Station, (b) Sutluce Station, (c,d) Kale Station.
Figure 4. Taylor diagrams for the testing phase: (a) Durucasu Station, (b) Sutluce Station, (c,d) Kale Station.
Water 14 03636 g004
Figure 5. Scatter plots for (a) Durucasu (S1M), (b) Durucasu (S1MP) (c) Sutluce (S1M), (d) Sutluce (S1MP), (e) Kale (S1M), (f) Kale (S1MP), (g) Kale (S1413M) and (h) Kale (S1413MP).
Figure 5. Scatter plots for (a) Durucasu (S1M), (b) Durucasu (S1MP) (c) Sutluce (S1M), (d) Sutluce (S1MP), (e) Kale (S1M), (f) Kale (S1MP), (g) Kale (S1413M) and (h) Kale (S1413MP).
Water 14 03636 g005aWater 14 03636 g005b
Figure 6. Time variation graphs of the observed and predicted stream flows by CB, RF, XG, ANN and NLR: (a) Durucasu (S1M), (b) Durucasu (S1MP) (c) Sutluce (S1M), (d) Sutluce (S1MP), (e) Kale (S1M), (f) Kale (S1MP), (g) Kale (S1314M) and (h) Kale (S1314MP).
Figure 6. Time variation graphs of the observed and predicted stream flows by CB, RF, XG, ANN and NLR: (a) Durucasu (S1M), (b) Durucasu (S1MP) (c) Sutluce (S1M), (d) Sutluce (S1MP), (e) Kale (S1M), (f) Kale (S1MP), (g) Kale (S1314M) and (h) Kale (S1314MP).
Water 14 03636 g006aWater 14 03636 g006b
Table 1. Statistical properties of the streamflow data.
Table 1. Statistical properties of the streamflow data.
StationStation NoPhaseStreamflow Data
QmaxQminQmeanSkCVSTD
Durucasu1413Test173940.12.110.9638.3
Train1694.632.52.430.928.6
Sutluce1414Test39.65.914.11.540.567.7
Train37.54.512.21.870.536.42
Kale1402Test33443.5104.22.380.6568.6
Train38747.7126.71.60.6076.4
Notes: Qmax: Maximum streamflow, Qmin: Minimum streamflow, Qmean: Mean streamflow, Sk: Skewness, CV: Variation coefficient, STD: Standard deviation.
Table 2. The accuracies of the CB, RF, XGB, ANN and NLR methods in predictions of monthly streamflow at the Durucasu Station (Code: 1413) in the testing phase.
Table 2. The accuracies of the CB, RF, XGB, ANN and NLR methods in predictions of monthly streamflow at the Durucasu Station (Code: 1413) in the testing phase.
Without TRMM DataWith TRMM Data
Model (Scenario)Model InputsRMSErRMSEMAE E L , M MAPEModel (Scenario)Model InputsRMSErRMSEMAE E L , M MAPE
CB (S1)Qt−124.110.6016.480.4048.12CB (S1P)Qt−1, P18.320.4611.710.5845.35
CB (S12)Qt−1, Qt−226.080.6515.900.4247.5CB (S12P)Qt−1, Qt−2, P24.240.6013.550.5146.15
CB (S123)Qt−1, Qt−2, Qt−326.760.6717.180.3848.19CB (S123P)Qt−1, Qt−2, Qt−3, P27.630.6915.590.4346.12
RF (S1)Qt−123.190.5816.210.4150.2RF (S1P)Qt−1, P15.030.3710.630.6146.15
RF (S12)Qt−1, Qt−226.990.6717.740.3653.2RF (S12P)Qt−1, Qt−2, P17.600.4411.430.5948.26
RF (S123)Qt−1, Qt−2, Qt−324.850.6215.780.4352.1RF (S123P)Qt−1, Qt−2, Qt−3, P17.410.4311.640.5847.15
XGB (S1)Qt−125.330.6317.600.3646.12XGB (S1P)Qt−1, P20.160.5014.130.4945.19
XGB (S12)Qt−1, Qt−226.020.6519.730.2848.2XGB (S12P)Qt−1, Qt−2, P15.730.3910.280.6346.15
XGB (S123)Qt−1, Qt−2, Qt−324.970.6217.460.3749.78XGB (S123P)Qt−1, Qt−2, Qt−3, P16.350.4111.640.5848.12
ANN (S1)Qt−127.860.6919.180.3953.2ANN (S1P)Qt−1, P22.180.5515.400.5350.23
ANN (S12)Qt−1, Qt−228.100.7021.310.3155.45ANN (S12P)Qt−1, Qt−2, P16.990.4211.310.6852.13
ANN (S123)Qt−1, Qt−2, Qt−326.720.6618.860.4057.8ANN (S123P)Qt−1, Qt−2, Qt−3, P17.490.4412.450.6452.14
NLR (S1)Qt−130.650.7620.710.4255.18NLR (S1P)Qt−1, P24.400.6116.320.5853.14
NLR (S12)Qt−1, Qt−230.350.7523.440.3458.49NLR (S12P)Qt−1, Qt−2, P18.350.4612.440.7155.12
NLR (S123)Qt−1, Qt−2, Qt−328.590.7120.370.4356.4NLR (S123P)Qt−1, Qt−2, Qt−3, P18.710.4713.700.6952.15
CB (S1M)Qt−1, MN20.690.5213.740.5042.2CB (S1MP)Qt−1, MN, P13.050.338.790.6825
CB (S12M)Qt−1, Qt−2, MN23.160.5814.260.4845.35CB (S12MP)Qt−1, Qt−2, MN, P24.040.6013.110.5124.5
CB (S123M)Qt−1, Qt−2, Qt−3, MN22.040.5513.200.5245.21CB (S123MP)Qt−1, Qt−2, Qt−3, MN, P25.110.6313.830.5025.6
RF (S1M)Qt−1, MN22.090.5513.810.5043.24RF (S1MP)Qt−1, MN, P14.480.369.330.6625.23
RF (S12M)Qt−1, Qt−2, MN21.950.5513.360.5248.26RF (S12MP)Qt−1, Qt−2, MN, P15.630.399.870.6425.48
RF (S123M)Qt−1, Qt−2, Qt−3, MN22.350.5613.020.5348.97RF (S123MP)Qt−1, Qt−2, Qt−3, MN, P16.160.4010.420.6226.5
XGB (S1M)Qt−1, MN18.650.5114.700.4740.23XGB (S1MP)Qt−1, MN, P12.330.318.770.6829.12
XGB (S12M)Qt−1, Qt−2, MN18.070.4512.690.5441.24XGB (S12MP)Qt−1, Qt−2, MN, P14.260.369.260.6628.45
XGB (S123M)Qt−1, Qt−2, Qt−3, MN18.10.4511.540.5843.5XGB (S123MP)Qt−1, Qt−2, Qt−3, MN, P14.020.3510.220.6328.75
ANN (S1M)Qt−1, MN21.280.5314.290.4842.12ANN (S1MP)Qt−1, MN, P16.010.1522.210.4731.12
ANN (S12M)Qt−1, Qt−2, MN22.340.5213.580.4843.15ANN (S12MP)Qt−1, Qt−2, MN, P16.330.1523.100.4731.15
ANN (S123M)Qt−1, Qt−2, Qt−3, MN21.920.5114.290.4744.12ANN (S123MP)Qt−1, Qt−2, Qt−3, MN, P15.530.1421.540.4632.12
NLR (S1M)Qt−1, MN24.400.6115.140.4547.35NLR (S1MP)Qt−1, MN, P30.260.2923.380.4532.14
NLR (S12M)Qt−1, Qt−2, MN23.670.5815.290.4648.2NLR (S12MP)Qt−1, Qt−2, MN, P29.050.2922.440.4332.17
NLR (S123M)Qt−1, Qt−2, Qt−3, MN24.640.6114.380.4547.65NLR (S123MP)Qt−1, Qt−2, Qt−3, MN, P29.960.2923.850.4735.14
Table 3. The accuracies of the CB, RF, XGB, ANN and NLR methods in predictions of monthly streamflow at the Sutluce Station (Code: 1414) in the testing phase.
Table 3. The accuracies of the CB, RF, XGB, ANN and NLR methods in predictions of monthly streamflow at the Sutluce Station (Code: 1414) in the testing phase.
Without TRMM DataWith TRMM Data
Model (Scenario)Model InputsRMSErRMSEMAE E L , M MAPEModel (Scenario)Model InputsRMSErRMSEMAE E L , M MAPE
CB (S1)Qt−17.830.565.180.1253.2CB (S1P)Qt−1, P5.020.363.710.3745.2
CB (S12)Qt−1, Qt−24.910.353.460.4154.8CB (S12P)Qt−1, Qt−2, P5.140.373.410.4246.5
CB (S123)Qt−1, Qt−2, Qt−34.960.353.410.4255.4CB (S123P)Qt−1, Qt−2, Qt−3, P5.590.43.560.4046.8
RF (S1)Qt−15.770.414.010.3255.2RF (S1P)Qt−1, P4.470.323.160.4648.5
RF (S12)Qt−1, Qt−24.980.353.350.4355.6RF (S12P)Qt−1, Qt−2, P4.200.33.050.4848.9
RF (S123)Qt−1, Qt−2, Qt−35.170.373.630.3856RF (S123P)Qt−1, Qt−2, Qt−3, P4.690.333.320.4447.5
XGB (S1)Qt−17.840.565.180.1252.32XGB (S1P)Qt−1, P6.910.494.940.1643.5
XGB (S12)Qt−1, Qt−26.120.434.460.2453.6XGB (S12P)Qt−1, Qt−2, P4.880.353.400.4242.9
XGB (S123)Qt−1, Qt−2, Qt−35.300.383.880.3453.87XGB (S123P)Qt−1, Qt−2, Qt−3, P5.390.383.610.3944.5
ANN (S1)Qt−18.620.625.590.1357.2ANN (S1P)Qt−1, P7.600.545.380.1852.3
ANN (S12)Qt−1, Qt−26.610.474.770.2658.9ANN (S12P)Qt−1, Qt−2, P5.270.383.670.4653.2
ANN (S123)Qt−1, Qt−2, Qt−35.670.414.270.3657.4ANN (S123P)Qt−1, Qt−2, Qt−3, P5.770.413.860.4354.1
NLR(S1)Qt−19.480.685.980.1457.6NLR (S1P)Qt−1, P8.360.605.920.1951.6
NLR (S12)Qt−1, Qt−27.140.515.200.2858.2NLR (S12P)Qt−1, Qt−2, P5.690.414.000.4953.1
NLR (S123)Qt−1, Qt−2, Qt−36.070.434.650.3958.4NLR (S123P)Qt−1, Qt−2, Qt−3, P6.170.444.090.4652.6
CB (S1M)Qt−1, MN3.800.272.910.5142.15CB (S1MP)Qt−1, MN, P3.100.222.160.6315.8
CB (S12M)Qt−1, Qt−2, MN3.370.242.580.5643.15CB (S12MP)Qt−1, Qt−2, MN, P4.080.292.820.5216.3
CB (S123M)Qt−1, Qt−2, Qt−3, MN4.490.322.910.5142.18CB (S123MP)Qt−1, Qt−2, Qt−3, MN, P4.500.322.860.5115.8
RF (S1M)Qt−1, MN4.630.333.150.5143.32RF (S1MP)Qt−1, MN, P3.600.262.580.6318
RF (S12M)Qt−1, Qt−2, MN4.430.312.940.5044.18RF (S12MP)Qt−1, Qt−2, MN, P3.780.272.590.5619.5
RF (S123M)Qt−1, Qt−2, Qt−3, MN4.560.322.990.4945.78RF (S123MP)Qt−1, Qt−2, Qt−3, MN, P3.880.282.630.5518.9
XGB (S1M)Qt−1, MN4.370.313.590.5140.59XGB (S1MP)Qt−1, MN, P3.650.262.750.5320.8
XGB (S12M)Qt−1, Qt−2, MN4.380.312.980.5039.18XGB (S12MP)Qt−1, Qt−2, MN, P3.670.252.760.5921.5
XGB (S123M)Qt−1, Qt−2, Qt−3, MN4.170.32.860.5240.12XGB (S123MP)Qt−1, Qt−2, Qt−3, MN, P3.770.272.520.5922.6
ANN (S1M)Qt−1, MN27.861.975.280.4242ANN (S1MP)Qt−1, MN, P3.920.281.030.5421.1
ANN (S12M)Qt−1, Qt−2, MN27.301.955.280.4243.5ANN (S12MP)Qt−1, Qt−2, MN, P3.800.271.080.5421.6
ANN (S123M)Qt−1, Qt−2, Qt−3, MN26.751.915.490.4342.26ANN (S123MP)Qt−1, Qt−2, Qt−3, MN, P4.120.291.050.5323.5
NLR(S1M)Qt−1, MN5.410.383.590.3943NLR (S1MP)Qt−1, MN, P5.010.363.340.4325.3
NLR (S12M)Qt−1, Qt−2, MN5.460.393.450.3943.6NLR (S12MP)Qt−1, Qt−2, MN, P4.810.343.370.4324.2
NLR (S123M)Qt−1, Qt−2, Qt−3, MN5.250.373.700.3742.5NLR (S123MP)Qt−1, Qt−2, Qt−3, MN, P4.810.343.270.4325.6
Table 4. Accuracies of the CB, RF, XGB, ANN and NLR methods in predictions of monthly streamflow at the Kale Station (Code: 1402) in the testing phase.
Table 4. Accuracies of the CB, RF, XGB, ANN and NLR methods in predictions of monthly streamflow at the Kale Station (Code: 1402) in the testing phase.
Without TRMM DataWith TRMM Data
Model (Scenario)Model InputsRMSErRMSEMAE E L , M MAPEModel (Scenario)Model InputsRMSErRMSEMAE E L , M MAPE
CB (S1)Qt−171.880.6953.03−0.2642.3CB (S1P)Qt−1, P51.410.4936.7−16.8339.5
CB (S12)Qt−1, Qt−261.360.5944.83−0.0645.2CB (S12P)Qt−1, Qt−2, P50.260.4837.090.1238.6
CB (S123)Qt−1, Qt−2, Qt−363.890.6149.82−0.1844.8CB (S123P)Qt−1, Qt−2, Qt−3, P56.250.5444.09−0.0537.6
RF (S1)Qt−166.980.6450.21−0.1945.3RF (S1P)Qt−1, P47.510.4634.15−17.2343.5
RF (S12)Qt−1, Qt−259.930.5744.52−0.0648.5RF (S12P)Qt−1, Qt−2, P51.040.4937.660.1142.5
RF (S123)Qt−1, Qt−2, Qt−363.250.6146.81−0.1146.8RF (S123P)Qt−1, Qt−2, Qt−3, P55.600.5340.590.0443.8
XGB (S1)Qt−183.010.860.21−0.4343.4XGB (S1P)Qt−1, P50.490.4837.83−17.5340.5
XGB (S12)Qt−1, Qt−266.860.6450.37−0.2043.5XGB (S12P)Qt−1, Qt−2, P56.270.5440.950.0340.15
XGB (S123)Qt−1, Qt−2, Qt−366.670.6448.71−0.1644.8XGB (S123P)Qt−1, Qt−2, Qt−3, P58.150.5642.81−0.0241.5
ANN (S1)Qt−191.310.8864.42−0.4648.9ANN (S1P)Qt−1, P55.540.5441.23−19.1143.5
ANN (S12)Qt−1, Qt−272.210.7055.41−0.2250.2ANN (S12P)Qt−1, Qt−2, P60.770.5944.640.0344.8
ANN (S123)Qt−1, Qt−2, Qt−371.340.6953.09−0.1849.5ANN (S123P)Qt−1, Qt−2, Qt−3, P62.220.6046.66−0.0244.9
NLR (S1)Qt−1100.440.9768.93−0.5050.2NLR (S1P)Qt−1, P61.090.5944.94−20.8344.32
NLR (S12)Qt−1, Qt−277.990.7560.40−0.2453.2NLR (S12P)Qt−1, Qt−2, P65.630.6348.660.0343.9
NLR (S123)Qt−1, Qt−2, Qt−376.330.7456.81−0.2054.9NLR (S123P)Qt−1, Qt−2, Qt−3, P66.580.6450.39−0.0243.72
CB (S1M)Qt−1, MN49.680.4835.860.1538.1CB (S1MP)Qt−1, MN, P29.110.2824.870.4130.6
CB (S12M)Qt−1, Qt−2, MN49.780.4837.320.1136.5CB (S12MP)Qt−1, Qt−2, MN, P44.300.4333.190.2131.2
CB (S123M)Qt−1, Qt−2, Qt−3, MN50.170.4837.450.1138.9CB (S123MP)Qt−1, Qt−2, Qt−3, MN, P44.840.4336.020.1530.8
RF (S1M)Qt−1, MN49.680.4836.200.1438RF (S1MP)Qt−1, MN, P33.410.3226.790.4132.35
RF (S12M)Qt−1, Qt−2, MN53.770.5240.000.0540.3RF (S12MP)Qt−1, Qt−2, MN, P46.990.4534.550.1833.54
RF (S123M)Qt−1, Qt−2, Qt−3, MN58.400.5642.030.0039.5RF (S123MP)Qt−1, Qt−2, Qt−3, MN, P51.630.536.860.1335.6
XGB (S1M)Qt−1, MN47.710.4636.290.1440.1XGB (S1MP)Qt−1, MN, P29.320.2823.800.4128.5
XGB (S12M)Qt−1, Qt−2, MN44.000.4230.030.2940.3XGB (S12MP)Qt−1, Qt−2, MN, P45.250.4331.590.2528.4
XGB (S123M)Qt−1, Qt−2, Qt−3, MN45.360.4433.940.1942.8XGB (S123MP)Qt−1, Qt−2, Qt−3, MN, P42.300.4130.220.2826.5
ANN (S1M)Qt−1, MN33.580.3236.200.1443.4ANN (S1MP)Qt−1, MN, P31.210.2924.740.4435.8
ANN (S12M)Qt−1, Qt−2, MN32.240.3136.200.1445.2ANN (S12MP)Qt−1, Qt−2, MN, P30.270.2925.980.4335.3
ANN (S123M)Qt−1, Qt−2, Qt−3, MN32.570.3137.290.1443.5ANN (S123MP)Qt−1, Qt−2, Qt−3, MN, P29.650.2924.250.4635.7
NLR (S1M)Qt−1, MN50.660.4936.080.1441.3NLR (S1MP)Qt−1, MN, P32.130.3026.510.3737.2
NLR (S12M)Qt−1, Qt−2, MN52.690.5135.360.1343.5NLR (S12MP)Qt−1, Qt−2, MN, P32.450.3127.310.3537.8
NLR (S123M)Qt−1, Qt−2, Qt−3, MN52.180.5036.440.1444.4NLR (S123MP)Qt−1, Qt−2, Qt−3, MN, P33.420.3225.180.3638.9
Table 5. Accuracies of the CB, RF, XGB, ANN and NLR methods in predictions of monthly streamflow at the Kale Station (Code:1402) using upstream data from the Durucasu (Code:1413) and Sutluce (Code: 1414) stations in the testing phase.
Table 5. Accuracies of the CB, RF, XGB, ANN and NLR methods in predictions of monthly streamflow at the Kale Station (Code:1402) using upstream data from the Durucasu (Code:1413) and Sutluce (Code: 1414) stations in the testing phase.
Without TRMM DataWith TRMM Data
Model (Scenario)Model InputsRMSErRMSEMAE E L , M MAPEModel (Scenario)Model InputsRMSErRMSEMAE E L , M MAPE
CB (S1314)Q1413, Q141433.510.3222.840.4623.5CB (S1314P)Q1413, Q1414, P310.323.760.4418.5
CB (S1314M)Q1413, Q1414, MN30.480.2922.720.4621CB (S1314MP)Q1413, Q1414, MN, P25.80.2520.390.5215.3
RF (S1314)Q1413, Q141432.600.3121.300.4926.8RF (S1314P)Q1413, Q1414, P27.70.2720.440.5223.1
RF (S1314M)Q1413, Q1414, MN29.040.2718.850.5525RF (S1314MP)Q1413, Q1414, MN, P28.70.2821.680.5218.8
XGB (S1314)Q1413, Q141448.590.4730.380.2823.5XGB (S1314P)Q1413, Q1414, P28.20.2721.670.4918.5
XGB (S1314M)Q1413, Q1414, MN38.630.3725.030.4121XGB (S1314MP)Q1413, Q1414, MN, P27.10.2621.220.5215.65
ANN (S1314)Q1413, Q141453.450.5233.420.3129.5ANN (S1314P)Q1413, Q1414, P31.020.3023.620.5218.5
ANN (S1314M)Q1413, Q1414, MN41.720.4027.530.4527.5ANN (S1314MP)Q1413, Q1414, MN, P29.270.2823.340.5625.45
NLR (S1314)Q1413, Q141458.800.5736.090.3433.2NLR (S1314P)Q1413, Q1414, P34.120.3325.510.5629.5
NLR (S1314M)Q1413, Q1414, MN45.060.4329.460.4830.3NLR (S1314MP)Q1413, Q1414, MN, P31.610.3024.970.6127
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Mehraein, M.; Mohanavelu, A.; Naganna, S.R.; Kulls, C.; Kisi, O. Monthly Streamflow Prediction by Metaheuristic Regression Approaches Considering Satellite Precipitation Data. Water 2022, 14, 3636. https://doi.org/10.3390/w14223636

AMA Style

Mehraein M, Mohanavelu A, Naganna SR, Kulls C, Kisi O. Monthly Streamflow Prediction by Metaheuristic Regression Approaches Considering Satellite Precipitation Data. Water. 2022; 14(22):3636. https://doi.org/10.3390/w14223636

Chicago/Turabian Style

Mehraein, Mojtaba, Aadhityaa Mohanavelu, Sujay Raghavendra Naganna, Christoph Kulls, and Ozgur Kisi. 2022. "Monthly Streamflow Prediction by Metaheuristic Regression Approaches Considering Satellite Precipitation Data" Water 14, no. 22: 3636. https://doi.org/10.3390/w14223636

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop