Next Article in Journal
Spatiotemporal Evolution and Attribution Analysis of Water Yield in the Xiangjiang River Basin (XRB) Based on the InVEST Model
Next Article in Special Issue
Flash Floods: Forecasting, Monitoring and Mitigation Strategies
Previous Article in Journal
Photodegradation of Sulfamethoxazole and Enrofloxacin under UV and Simulated Solar Light Irradiation
Previous Article in Special Issue
Prediction of the Periglacial Debris Flow in Southeast Tibet Based on Imbalanced Small Sample Data
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Research on Parameter Regionalization of Distributed Hydrological Model Based on Machine Learning

1
College of Water Resources, North China University of Water Resources and Electric Power, Zhengzhou 450046, China
2
China Institute of Water Resource and Hydropower Research, Beijing 100038, China
3
Beijing Tianzhixiang Information Technology Co., Ltd., Beijing 100191, China
*
Author to whom correspondence should be addressed.
Water 2023, 15(3), 518; https://doi.org/10.3390/w15030518
Submission received: 3 January 2023 / Revised: 22 January 2023 / Accepted: 25 January 2023 / Published: 28 January 2023
(This article belongs to the Special Issue Flash Floods: Forecasting, Monitoring and Mitigation Strategies)

Abstract

:
In the past decade, more than 300 people have died per year on average due to mountain torrents in China. Mountain torrents mostly occur in ungauged small and medium-sized catchments, so it is difficult to maintain high accuracy of flood prediction. In order to solve the problem of the low accuracy of flood simulation in the ungauged areas, this paper studies the influence of different methods on the parameter regionalization of distributed hydrological model parameters in hilly areas of Hunan Province. According to the terrain, landform, soil and land use characteristics of each catchment, we use Shortest Distance, Attribute Similarity, Support Vector Regression, Generative Adversarial Networks, Classification and Regression Tree and Random Forest methods to create parameter regionalization schemes. In total, 426 floods of 25 catchments are selected to calibrate the model parameters, and 136 floods of 8 catchments are used for verification. The results showed that the average values of the Nash–Sutcliffe coefficients of each scheme were 0.58, 0.64, 0.60, 0.66, 0.61 and 0.68, and the worst values were 0.27, 0.31, 0.25, 0.43, 0.35 and 0.59. The random forest model is the most stable solution and significantly outperforms other methods. Using the random forest model to regionalize parameters can improve the accuracy of flood simulation in ungauged areas, which is of great significance for flash flood forecasting and early warning.

1. Introduction

Hunan Province is located in the southeast inland of China, with abundant rainfall but extremely uneven temporal and spatial distribution. Due to frequent and high-intensity rainfall and short confluence time in hilly areas, the flood rises and falls steeply, which can very easily cause mountain torrents. The climate, underlying surface and geomorphic types in hilly areas are diverse, and most of them are areas without data. This is an important challenge for flood forecasting and early warning in hilly areas.
The hydrological model is an important tool for understanding the laws of hydrological science, analyzing hydrological processes and studying hydrological cycle mechanisms [1]. How to identify hydrological parameters in ungauged areas accurately is an important area of research for PUB (Prediction in Ungauged Basins). The regionalization method is usually used to determine the parameters of hydrological models for ungauged basins at present, and the commonly used methods include shortest distance, attribute similarity, regression, average, machine learning, etc. The main idea of the regionalization method is to analyze the relationship between model parameters and characteristic attributes of basins, and the parameters of the hydrological model for ungauged basins are deduced from the calibration results of gauged basins [2].
The parameter transplant method includes the shortest distance method and the attribute similarity method. Among them, the distance approach refers to finding one or more basins adjacent to the research object in the geographical location. The attribute similarity method is used to find a basin that is similar to the research basin in attributes. Young achieved the ideal result of parameter transplant by computing the spatial distance between 260 catchments in the UK [3]. Parajka et al. selected indicators such as watershed area, average slope, watershed latitude, river network density, vegetation coverage, drought index, etc., to analyze the similarity of watershed attributes and complete parameter transplantation. The results show that attribute selection plays a decisive role in the performance of transplantation [4]. Li et al. compared the shortest distance method with the attribute similarity method and pointed out that the performance of transplantation results is affected by the density of hydrological stations, and it is easier to achieve better results in areas with dense hydrological stations [5].
The parameter regression method is mainly used to establish the functional relationship between watershed characteristics and model parameters. Yokoo et al. established a multiple linear regression equation between the Tank model parameters and soil, geology and land use data [6]. Cheng et al. established a regression equation between the SCS model parameter CN, concentration time and soil, land use, average slope and river length [7]. Based on the parameter regionalization method combining spatial proximity and stepwise regression analysis, Yao et al. found that stepwise regression analysis can effectively deduce the sensitive parameters [8]. Sun et al. pointed out that the parametric regression method is prone to the phenomenon of “the same effect of different parameters”, and the basin properties screening is highly subjective, which is not suitable for small samples [9].
Machine learning research mainly includes SOM classification and the CART decision tree method. Yi et al. used hierarchical clustering analysis HCA and unsupervised neural network SOM methods to divide the sub basins of Dianchi Lake basin into 7 groups based on 16 physical characteristics, and they believed that the basin parameters of the same group can be transplanted to each other [10]. Ragettli et al. took 35 basins in different regions of China as the research object, comprehensively considering the physical properties of watersheds and the spatial distance of watersheds; the CART tree model was used to optimize the parameter transplantation rules, and the results show that the CART tree has better parameter adaptability [11]. Liu et al. conducted a parametric zoning study on 19 small catchments in Henan Province; the success rate of parameter transplantation based on the CART tree is about 20% higher than that of random transplantation [12].
The advantage of the CART tree is that it is easy to interpret and the mapping between basin characteristics and transplantation rules is intuitive. In recent years, with the advent of machine learning algorithms, more and more models have been used to create parameter transplantation schemes. However, many machine learning algorithms usually require a large number of samples, and data showing that hydrological model modeling can be used for parameter calibration is often very limited, so it is necessary to reasonably build a large number of learning samples, or to study intelligent algorithms suitable for small sample research. In this study, 33 small and medium-sized catchments in Hunan Province were taken as examples. We constructed distributed hydrological models of these catchments and selected four machine learning models—Support Vector Regression, Generative Adversarial Networks, Classification and Regression Tree, and Random Forest—to create different parametric regionalization schemes and compared them with two traditional methods—Shortest Distance and Attribute Similarity. By analyzing the transplantation results of different schemes, it can provide a reference for determining the parameters of the distributed hydrological model in ungauged areas, which is very valuable for flash flood forecasting and early warning.

2. Materials and Methods

2.1. Study Area

Hunan Province is located on the South Bank of the middle reaches of the Yangtze River. The general geomorphological characteristics are that it is surrounded by mountains in the east, south and west, hills in the middle, plains and lakes in the north, and an asymmetric horseshoe basin that was high in the southwest and low in the northeast. XueFeng mountain runs through the central part of the province from southwest to northeast, which divides the whole province into two parts: mountainous area and hilly area. Due to the comprehensive influence of monsoon circulation and the geomorphic conditions, the mid subtropical monsoon humid climate with obvious continental characteristics is formed. Mountain torrents occur frequently because of the complex topography, developed water system and abundant rainfall. The average annual precipitation in Hunan Province is 1450 mm, but the distribution of precipitation is uneven in time and space, and the interannual variation is large, with an average annual variation of 1200–1800 mm. The province’s annual average water surface evaporation is 736.5 mm, with a variation range of 600–900 mm.

2.2. Data Collection

Taking 33 hydrological stations with observation data from 1979 to 2020 in Hunan Province as examples, we collected the ASTER GDEM V2 dataset, land use layer and soil type layer in Hunan Province. At the same time, a distributed hydrological model of all hydrological stations was established with 30 min as the simulation step. In this study, 426 floods in 25 catchments were selected to calibrate the model parameters, and the regionalization scheme was determined by comparing the simulation results of the other 8 catchments. Figure 1 shows the distribution of hydrological stations.
The smallest catchment is HengBanQiao, with a catchment area of 31 km2, and the largest catchment is FeiXian, with a catchment area of 3659 km2. The hydrological data collection is shown in Table 1.

2.3. Modeling Approaches

2.3.1. Distributed Hydrological Model

Based on the ASTER GDEM V2 dataset, the sub basin and river are extracted by GIS tools. The resolution of the DEM data grid is 30 m, and the area of the sub basin is controlled within 10–30 km2. At the same time, the attributes of sub basins and rivers are extracted, including basin area, slope, longest concentration path, average altitude, average drop (average elevation minus outlet elevation), river length, river section gradient, geomorphic unit hydrograph, etc.
The Xinanjiang model is adopted for runoff generation computation [13,14,15]. A three-layer evaporation model is used to calculate watershed evaporation. The total runoff produced by rainfall is computed according to the concept of saturated runoff, and the influence of the uneven underlying surface on runoff yield area is considered by the water storage curve of the basin. In the aspect of runoff component division, according to the runoff production theory of “hillside hydrology”, the total runoff is divided into saturated surface runoff, soil water runoff and groundwater runoff by a reservoir with limited volume, a side hole and a bottom hole. The unit hydrograph is used to convert the surface runoff into the overland flow, and the linear reservoir model is used to calculate the interflow and groundwater flow, and is finally incorporated into the river network. Figure 2 shows the structure of the Xinanjiang model.
Table 2 shows the parameters of the Xinanjiang model, all of which need to be determined through parameter calibration.
The geomorphic unit hydrograph model is adopted for overland flow concentration computation, which is based on the results of DEM data analysis (Figure 3).
The flow direction of each grid is analyzed according to the D8 algorithm [16], and the probability density distribution function of concentration time is determined by computing the time of each water particle falling on the surface of the basin reaching the outlet, so as to further determine the geomorphic unit hydrograph [17]. Based on the principle of energy conversion, this improves the formula of flow velocity and unifies the formula of slope velocity and river velocity [18], as shown in Formula (1).
v = 2 μ g   k = 1 n s i n θ k 2 n k Δ h k   n
where μ′ is the energy residual coefficient and its range is [0,1], θ is the slope angle of the grid outflow direction, n is the total number of grids in the basin upstream of the target grid (including the target grid), g is the gravity acceleration, Δh is the elevation difference between the target grid and the outflow grid, N is the number of inflow grids of the target grid, nk and vk are the number of upstream grids and the average flow velocity of the kth inflow grid, respectively.
The Muskingum model is used for river network flow concentration [19,20]. Continuous flood routing is realized by segment-by-segment estimation of the model parameters [21].

2.3.2. Evaluation Criteria

To evaluate the suitability of the proposed model for the studied Basin, the Nash–Sutcliffe Coefficient of Efficiency (NSCE) is chosen to analyze the degree of goodness of fit [22], which is defined as:
N S C E = 1 i = 1 N ( Q s ( i ) Q 0 ( i ) ) 2 i = 1 N ( Q 0 ( i ) Q ¯ ) 2
where Qo(i) and Qs(i) are the observed and simulated flow, respectively, N is the number of data points, and Q ¯ is the mean value of the observed flow. According to national criteria for flood forecasting in China [23], the scheme is excellent when the average NSCE reaches 0.9. When the average NSCE is greater than 0.7 and less than 0.9, the effect of this scheme is better. This scheme is for reference only; if the average NSCE is greater than 0.5 but less than 0.7, it may not be accurate. Otherwise, the results of the performances of parameter calibration are unsatisfactory for online flood forecasting.

2.3.3. Parameter Optimization Method

The shuffled complex evolution (SCE-UA) method is used to optimize the model parameters. The SCE-UA algorithm is a nonlinear hybrid algorithm which combines the advantages of the genetic algorithm and the simplex algorithm, and is based on information exchange and biological evolution laws. It can effectively solve the problems of multi-peak, multi-noise, discontinuity, high-dimension and non-linearity in parameter optimization. Figure 4 shows the calculation flow of the SCE-UA algorithm. This method can efficiently and quickly search for the global optimal solution of model parameters [24,25]. There are 14 parameters that need to be optimized in this study, including 13 parameters of the Xinanjiang model (see Table 1) and 1 parameter of the geomorphic unit hydrograph (μ′). μ′ is the energy residual coefficient and its range is [0,1].
According to the evaluation criteria, the larger the NSCE, the better the simulation effect. Therefore, this study aimed to find the highest mean value of NSCE. Since the goal of the SCE-UA algorithm is to find the minimum, Equation (3) is used as the objective function.
F = 1 i = 1 t N S C E i t
where F is the value of objective function, t is the number of floods.

2.3.4. Parameter Regionalization Scheme

The Shortest Distance, Attribute Similarity, Support Vector Regression, Generative Adversarial Networks, Classification and Regression Tree and Random Forest method are used to determine the parameter regionalization scheme, and the final scheme is determined by comparing the simulation results of different methods. For readability, Table 3 lists the abbreviations representing the different methods.
(1)
Shortest Distance (SD)
The nearest basin is determined by computing the spatial distance between the centroid coordinates of the study basin and other basins, and the model parameters of the nearest basin are directly applied to the distributed model of the study basin.
D = 2 R sin 1 s i n ( L a t 1 L a t 2 2 ) 2 + c o s ( L a t 1 ) c o s ( L a t 2 ) s i n ( L o n 1 L o n 2 2 ) 2
where D is the distance, R is the radius of the earth, about 6,378,137 m, and Lon1, lat1, lon2 and Lat2 are the centroid coordinates of the two basins.
(2)
Attributes Similarity (AS)
The area (A, km2), average slope (P), average elevation (E, m), average elevation drop (H, m), shape coefficient (L), soil type S = {s1, s2, s3} (s1, s2 and s3 are the percentages of clay, silt and sand, %) and land use U = {u1, u2, u3, u4} (u1, u2, u3, u4 are the percentages of forest, grass, cultivated land and other, %) were selected for similarity analysis. The components of U and S range in value from 0 to 1, so no additional processing is required. However, for other attributes, the maximum value method is used for normalization, as follows: collect the maximum values MaxA, MaxP, MaxE, MaxH and MaxL of attribute A, P, E, H and L in 33 catchments, and then let C = {A/MaxA, P/MaxP, E/MaxE, H/MaxH, L/MaxL}, then C is the normalized result. The similarity index of catchment x and catchment y was defined as Formula (5):
T = cos ( S x ,   S y ) + cos ( U x ,   U y ) + 1 D ( C x ,   C y )   3
where D(a,b) and cos(a,b) are Euclidean distances and cosines value of two vectors a and b, respectively.
D ( a , b ) = i = 1 n ( a i b i ) 2
cos ( a ,   b ) = i = 1 n a i b i i = 1 n a i 2 i = 1 n b i 2
where T is the similarity index, and its range is [0,1]. The larger the T value, the greater the similarity between the two catchments. Select the basin most similar to the study basin and transplant its parameters.
(3)
Support Vector Regression (SVR)
The essence of a support vector machine (SVM) is to map the non-linear function relationship to the linear problem of high-dimensional space, and then find the optimal regression hyperplane in this high-dimensional space, so that all samples are the minimum distance from the optimal hyperplane [26]. Support Vector Regression (SVR) is a method based on a support vector machine to deal with regression problems. It is used to study the relationship between input variables and numerical output variables, and to predict the output value of new variables. It retains the advantages of a support vector machine and is mainly used in the case of a limited or small number of samples [27].
(4)
Generative Adversarial Network (GAN)
The generative adversarial network (GAN) is an unsupervised learning model consisting of a discriminator and a generator [28]. The generator automatically generates data, learns the distribution of real samples, and generates pseudo samples that are close to real samples. The discriminator has to distinguish between real samples obtained from the data and fake samples generated by the generator. The two models are iteratively optimized through continuous confrontation training, so that the data distribution generated by the generator is as close as possible to the real data distribution. When the probability of each output of the discriminator is basically 1/2, it indicates that the model has reached the optimal state.
(5)
Classification And Regression Tree (CART)
The CART (Classification and Regression Tree) algorithm is a decision tree classification method. It uses a dichotomy recursive segmentation technique to divide the current sample set into two sub sample sets, so that each non leaf node generated has two branches. The decision tree is a weak learning algorithm [29]. The improvement of classification accuracy depends on the reasonable construction and pruning of the tree structure. The CART algorithm generates a decision tree based on the training dataset, and the generated decision tree should be as large as possible. The validation dataset is used to prune the generated tree and select the optimal subtree. At this time, the minimum loss function is used as the pruning standard.
(6)
Random Forest (RF)
Random forest model generates multiple different datasets from the original dataset by sampling with put back [30]. The CART tree is used as a weak classifier, and each sub-dataset corresponds to a classifier. Each decision tree selects the attribute with the strongest classification ability for node splitting, without pruning to maximize growth. All final generated decision trees form a random forest. The model can be used for classification or regression prediction, the result of which is determined by the classifier voting.
Based on the above, Figure 5 shows the flow of parameter regionalization. When SVR, GAN, CART and RF are selected for parameter transplantation. The analysis steps are as follows:
(1)
For each calibrated catchment A, use the model parameters of any catchment B to compute the average Nash–Sutcliffe coefficient NSCEa-b. Collect all catchment A attributes, catchment B attributes, NSCEa-b as training dataset for model training. In this study, the sample size of the training set is 25 × 25.
(2)
For each verified catchment C and calibrated catchment D, use the trained model to take the attributes of C and D as input to predict the mean NSCE, and the parameter group with the highest predictive value is used as the model parameter of C.

3. Results

3.1. Model Parameter Optimization

The SCE-UA algorithm is used to automatically optimize the model parameters of 25 hydrological stations, and the objective function is to obtain the highest average Nash–Sutcliffe coefficient. The results of parameter calibration are shown in Table 4.
It can be seen that there are 23 hydrological stations with an average NSCE between 0.7 and 0.9, and 2 between 0.5 and 0.7. According to national criteria for flood forecasting in China, most calibration parameters meet the requirements of online flood forecasting. The distributed model based on the Xinanjiang model and geomorphic unit hydrograph is stable and suitable for most areas of Hunan Province.
The calibration parameters were fed into the distributed model to simulate 426 floods in 25 catchments. Taking LianYuan Station as an example, the calibration result is shown in Figure 6.

3.2. Regionalization Schemes

The shortest distance, attribute similarity, support vector regression, generative adversarial networks, classification and regression tree, and random forest models are selected to construct and verify the parameter regionalization scheme.
According to the catchment attributes, the results of SD and AS can be directly calculated. The centroid coordinates and basic attributes of the 33 catchments are shown in Table 5, including east longitude (lon, ◦), north latitude (lat, ◦), area (A, km2), average slope (P), average elevation (E, m), average elevation drop (H, m), shape coefficient (L), and the percentages of forest (u1), grass (u2), cultivated land (u3), other (u3), clay (s1), silt (s2) and sand (s3). These attributes were extracted during sub-watershed division.
According to the coordinates of the center of the basin, the centroid distance between the verification basin and the calibration basin is calculated by Formula (3), and the calibration basin with the closest distance is selected, and its model parameters are used directly. Normalize the basin properties, calculate the similarity index between the verification basin and the calibration basin using Formula (4), and transfer the model parameters with the highest similarity. Table 6 shows the transplant results of the SD and AS methods.
For SVR, GAN, CART and RF methods, we need to collect samples and train the model first. This required cross-validation of the model parameters for 25 catchments. We apply 25 groups of parameters to the flood simulations of 25 catchments and calculate the mean NSCE.
Figure 7 shows the 25 × 25 cross-validation results for the samples. Among the 625 samples, there are 121 samples with a Nash–Sutcliffe coefficient greater than 0.7, accounting for 19.4% of the total number of samples; 138 samples with a Nash–Sutcliffe coefficient between 0.6 and 0.7, accounting for 22.1%; 112 samples with a Nash–Sutcliffe coefficient between 0.6 and 0.7, accounting for 17.9%; and 254 samples with a Nash–Sutcliffe coefficient less than 0.5, accounting for 40.1%. These samples are used as input to train four models of SVR, GAN, CART and RF, and the results of different parameter groups are used to predict eight verification basins, and the optimal results are selected for parameter transplantation. The Nash–Sutcliffe coefficients of the simulation results are shown in Table 7.

4. Discussion

It can be seen from Table 6 that two groups, CaojiTan-ZhaiQian and DongKou-HengBanQiao, performed poorly when using the transplantation parameters of the SD method, with average NSCE of 0.27 and 0.29, respectively. When the AS method was used for transplant parameters, two groups had poor results, namely JiaHe-JiaYi and JiShou-ShanXiQiao, with average NSCEs of 0.31 and 0.46, respectively. Table 8 shows the attributes of these groups of catchments.
It can be seen from Table 8 that DongKou and HengBanQiao are not only close, but also most of the attributes are similar except for the area and average drop. The area of DongKou is 931 km2, and the area of HengBanQiao is 31 km2. Their average drops are 458.6 m and 274.67 m, respectively. It is obvious that the different areas will lead to large differences in concentration time, and the average drop may significantly affect the concentration speed, which is the most critical factor affecting the geomorphic unit hydrograph [18]. Similarly, compared with JiaHe and JiaYi, their attributes are very similar, except for average elevation and drop. Therefore, we can infer that if the attributes of two catchments are very close, but their average drop difference is significant, this is likely to cause a failed transplantation. The opposite conclusion cannot be established. Table 6 shows an example with the best results (LuoLingQiao-LianYuan). The average NSCE of transplantation can reach 0.86, which is excellent according to the evaluation criteria. However, the attributes of the two catchments, including the average drop, differed significantly (as shown in Table 5).
From the above cases, it can be seen that the applicable conditions and scope of parameter transplantation are relatively complex, and a single factor cannot be considered in isolation. When multiple attributes are considered for parameter transplantation, the results may not be satisfactory for catchments with similar attributes sometimes, so precisely defining the similarity index is a challenge.
In contrast, machine learning methods can discover more hidden rules in data. However, the methods of machine learning cannot all achieve satisfactory results. Comparing only the average NSCE, the results of SVR and CART were even worse than the AS method. In order to better compare the performance of different methods, Table 9 shows the optimal value, worst value and average value obtained using different methods. Figure 8 shows the average NSCEs for the different methods.
It can be seen from Table 9 that the average and worst Nash–Sutcliffe coefficients of the simulation results using the random forest model are the highest. Among the best NSCE results in Table 9, AR > SVR ≥ CART > RF ≥ GAN > SD, with AR performing best and SD performing worst. The worst result of NSCE is RF > GAN > CART > AR > SD > SVR; RF is the best and SVR is the worst. According to the NSCE average results, RF > GAN > AR > CART> SVR > SD; RF performed the best and SD performed the worst.
Table 10 summarizes the validation results of the different methods and shows the percentage of catchments with an average NSCE greater than 0.9, greater than 0.7 and less than 0.9, greater than 0.5 and less than 0.7, and less than 0.5.
It can be seen from Table 10 that all of the NSCE results of RF are greater than 0.5, which is not achieved by all of the other methods. According to national criteria for flood forecasting in China, if the average NSCE is less than 0.5, the simulation result is unsatisfactory for online flood forecasting. Therefore, the RF model has better performance than the other methods.
Figure 9 lists the importance of each attribute in the RF model. The most important attribute for prediction using the RF model is the percentage of cultivated land area within the transplanted catchment, followed by the area and average elevation of the calibration catchment. It is well known that slope is a significant impact on hydrological models. However, from the parameter importance of the RF model, the influence of slope is smaller than that of cultivated land, which may be another issue that needs further research.

5. Conclusions

In this study, the distribution hydrological models of 33 small and medium-sized catchments in Hunan Province were constructed. The model parameters of 25 catchments were calibrated by using the SCE-UA algorithm. The parameter regionalization scheme including Shortest Distance (SD), Attribute Similarity (AS), Support Vector Regression (SVR), Generative Adversarial Networks (GAN), Classification and Regression Tree (CART) and Random Forest (RF) were validated using data from eight catchments. The main conclusions are as follows:
(1)
A total of 426 floods of 25 catchments were selected to calibrate the model parameters. Among the simulation results of these 25 catchments, there are 23 catchments with an average NSCE greater than 0.7, and 2 between 0.5 and 0.7. According to national criteria for flood forecasting in China, most calibration parameters meet the requirements of online flood forecasting. The distributed model based on the Xinanjiang model and geomorphic unit hydrograph is suitable for most areas of Hunan Province.
(2)
Based on the watershed attributes and cross validation results of model parameters, six parameter regionalization schemes including SD, AR, SVR, GAN, CART and RF were generated, and 136 floods of 8 catchments were used for verification. The average values of the Nash–Sutcliffe coefficients of each scheme were 0.58, 0.64, 0.60, 0.66, 0.61 and 0.68, and the worst values were 0.27, 0.31, 0.25, 0.43, 0.35 and 0.59. The Nash–Sutcliffe coefficients of the RF model are all greater than 0.5, which cannot be achieved by other methods. The RF model is the most stable solution and significantly outperforms other methods. Using the random forest model to regionalize parameters can improve the accuracy of flood simulation in ungauged areas, which is of great significance for flash flood forecasting and early warning.
(3)
The applicable conditions and scope of parameter transplantation are relatively complex, and a single factor cannot be considered in isolation, and the attributes of adjacent catchments may also vary greatly. The result of the attribute similarity method is not very stable, and transplantation can fail when most of the attributes of two catchments are similar, but if the attributes are very different, sometimes good results will be achieved. According to the parameter importance analyzed by the RF model, the slope is not so important, while the cultivated land area is the key to decision making. This result goes against common sense and deserves further research.
There are many factors that affect the accuracy of parameter transplantation. In practice, continuous data collection is required to improve the quality of the underlying dataset. With the accumulation of data and the continuous improvement of the regionalization model, the accuracy of parameter transplantation can be improved.

Author Contributions

Conceptualization, W.W. and Y.T.; methodology, Q.M. and Y.T.; software, Y.Z.; validation, R.D.; project administration, C.L. All authors have read and agreed to the published version of the manuscript.

Funding

Special project for collaborative innovation of science and technology in 2021 (No: 202121206).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Gou, J.J.; Miao, C.Y.; Duan, Q.Y. Progress in parameter sensitivity analysis-optimization-regionalization methods for hydrological models. Prog. Geogr. 2022, 41, 1338–1348. [Google Scholar] [CrossRef]
  2. Sivapalan, M. Prediction in ungauged basins: A grand challenge for theoretical hydrology. Hydrol. Process. 2003, 17, 3163–3170. [Google Scholar] [CrossRef]
  3. Young, A.R. Stream flow simulation within UK ungauged catchments using a daily rainfall-runoff model. J. Hydrol. 2006, 320, 155–172. [Google Scholar] [CrossRef]
  4. Parajka, J.; Bloschl, G.; Merz, R. Regional calibration of catchment models: Potential for ungauged catchments. Water Resour. Res. 2007, 43. [Google Scholar] [CrossRef]
  5. Li, H.X.; Zhang, Y.Q.; Chiew, F.H.S.; Xu, S. Predicting runoff in ungauged catchments by using Xinanjiang model with MODIS leaf area index. J. Hydrol. 2009, 370, 155–162. [Google Scholar] [CrossRef]
  6. Yokoo, Y.; Kazama, S.; Sawamoto, M.; Nishimura, H. Regionalization of lumped water balance model parameters based on multiple regression. J. Hydrol. 2001, 246, 209–222. [Google Scholar] [CrossRef]
  7. Cheng, X.; Ma, X.X.; Wang, W.S.; Liu, X.X.; Wang, Q.L.; Xiao, Y. Applicability Research of HEC-HMS Model Parameter Regionalization in Small Basin of Henan Province. J. China Hydrol. 2022, 42, 40–46, 102. [Google Scholar] [CrossRef]
  8. Yao, C.; Qiu, Z.Y.; Li, Z.J.; Hu, W.D.; Xu, J. Parameter regionalization study and application of API model and Xin’anjiang model. J. China Inst. Water Resour. Hydropower Res. 2019, 47, 189–194. [Google Scholar] [CrossRef]
  9. Sun, Z.L.; Liu, Y.L.; Chen, X.; Shu, Z.K.; Wu, H.F.; Wang, J.; Bao, Z.X.; Wang, G.Q. Review of Hydrological Model Parameter Regionalization Method [OL]. J. China Hydrol. 2022, 1–9. [Google Scholar] [CrossRef]
  10. Yi, X.; Zhou, F.; Wang, X.Y.; Yang, Y.H.; Guo, H.C. Classification and runoff simulation of data-scarce basins based on self-organizing maps. Prog. Geogr. 2014, 33, 1109–1116. [Google Scholar]
  11. Ragettli, S.; Zhou, J.; Wang, H.; Liu, C.; Guo, L. Modeling flash floods in ungauged mountain catchments of China: A decision tree learning approach for parameter regionalization. J. Hydrol. 2017, 555, 330–346. [Google Scholar] [CrossRef]
  12. Liu, C.J.; Zhou, J.; Wen, L.; Ma, Q.; Guo, L.; Ding, L.Q.; Sun, D.Y. Research on spatio temporally-mixed runoff model and parameter regionalization for small and medium-sized catchments. J. China Inst. Water Resour. Hydropower Res. 2021, 19, 99–114. [Google Scholar] [CrossRef]
  13. Zhao, R.J. The Xinanjiang model applied in China. J. Hydrol. 1992, 135, 371–381. [Google Scholar] [CrossRef]
  14. Zhao, R.J.; Zhang, Y.L.; Fang, L.R. The Xinanjiang model. In Hydrological Forecasting, Proceeding Oxford Symposium, Oxford, UK, 15–18 April 1980; IASH: Washington, DC, USA, 1980; pp. 351–356. [Google Scholar]
  15. Wang, W.C.; Cheng, C.T.; Chau, K.W.; Xu, D.M. Calibration of Xinanjiang model parameters using hybrid genetic algorithm based fuzzy optimal model. J. Hydroinform. 2012, 14, 784–799. [Google Scholar] [CrossRef] [Green Version]
  16. Fairfield, J.; Leymarie, P. Drainage networks from grid digital elevation models. Water Resour. Res. 1991, 27, 709–717. [Google Scholar] [CrossRef]
  17. Rui, X.; Yu, M.; Liu, F.; Gong, X. Calculation of watershed flow concentration based on the grid drop concept. Water Sci. Eng. 2008, 1, 1–9. [Google Scholar] [CrossRef] [Green Version]
  18. Wang, W.-C.; Zhao, Y.-W.; Chau, K.-W.; Xu, D.-M.; Liu, C.-J. Improved flood forecasting using geomorphic unit hydrograph based on spatially distributed velocity field. J. Hydroinform. 2021, 23, 724–729. [Google Scholar] [CrossRef]
  19. Tewolde, M.; Smithers, J. Flood routing in ungauged catchments using Muskingum methods. Water SA 2006, 32, 379–388. [Google Scholar] [CrossRef] [Green Version]
  20. Todini, E. A mass conservative and water storage consistent variable parameter Muskingum-Cunge approach. Hydrol. Earth Syst. Sci. 2007, 11, 1645–1659. [Google Scholar] [CrossRef] [Green Version]
  21. Song, X.-M.; Kong, F.-Z.; Zhu, Z.-X. Application of Muskingum routing method with variable parameters in ungauged basin. Water Sci. Eng. 2011, 4, 1–12. [Google Scholar] [CrossRef]
  22. Nash, J.E.; Sutcliffe, J.V. River flow forecasting through conceptual models part I—A discussion of principles. J. Hydrol. 1970, 10, 282–290. [Google Scholar] [CrossRef]
  23. National Center of Hydrological Information. The National Criteria for Hydrological Forecasting; Hydroelectric Press: Beijing, China, 2000. [Google Scholar]
  24. Duan, Q.Y.; Gupta, V.K.; Sorooshian, S. Shuffled complex evolution approach for effective and efficient global minimization. J. Optim. Theory Appl. 1993, 76, 501–521. [Google Scholar] [CrossRef]
  25. Sorooshian, S.; Duan, Q.Y.; Gupta, V.K. Optimal use of the SCE-UA global optimization method for calibrating watershed models. J. Hydrol. 1994, 158, 265–284. [Google Scholar] [CrossRef]
  26. Okkan, U.; Serbes, Z.A. Rainfall-runoff modeling using least squares support vector machines. Environmetrics 2012, 23, 549–564. [Google Scholar] [CrossRef]
  27. Panahi, M.; Sadhasivam, N.; Pourghasemi, H.R.; Rezaie, F.; Lee, S. Spatial prediction of groundwater potential mapping based on convolutional neural network (CNN) and support vector regression (SVR). J. Hydrol. 2020, 588, 125033. [Google Scholar] [CrossRef]
  28. Wang, M.Q.; Yuan, W.W.; Zhang, J.Y. Overview of research on Generative Adversarial Network GAN. Comput. Eng. Des. 2021, 42, 3389–3395. [Google Scholar] [CrossRef]
  29. Li, X.N.; Zhang, Y.J.; She, Y.J.; Chen, L.W.; Chen, J.X. Estimation of impervious surface percentage of river network regions using an ensemble leaning of CART analysis. Remote Sens. Land Resour. 2013, 25, 174–179. [Google Scholar] [CrossRef]
  30. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar]
Figure 1. Distribution of hydrological stations.
Figure 1. Distribution of hydrological stations.
Water 15 00518 g001
Figure 2. Computation flow of the Xinanjiang model.
Figure 2. Computation flow of the Xinanjiang model.
Water 15 00518 g002
Figure 3. Flow direction and flow concentration routes.
Figure 3. Flow direction and flow concentration routes.
Water 15 00518 g003
Figure 4. Flow of the shuffled complex evolution (SCE-UA) method.
Figure 4. Flow of the shuffled complex evolution (SCE-UA) method.
Water 15 00518 g004
Figure 5. Flow of parameter regionalization method.
Figure 5. Flow of parameter regionalization method.
Water 15 00518 g005
Figure 6. Comparison of observed and simulated hydrograph of LianYuan station.
Figure 6. Comparison of observed and simulated hydrograph of LianYuan station.
Water 15 00518 g006
Figure 7. Cross validation results of parameter transplantation.
Figure 7. Cross validation results of parameter transplantation.
Water 15 00518 g007
Figure 8. Validation results of regionalization schemes.
Figure 8. Validation results of regionalization schemes.
Water 15 00518 g008
Figure 9. Importance of attributes in RF model.
Figure 9. Importance of attributes in RF model.
Water 15 00518 g009
Table 1. Information of study stations.
Table 1. Information of study stations.
Station NameArea
(km2)
Data YearsNumber of FloodsNumber of Rain StationsNumber of Sub BasinsType
NingXiang22502013–20201865174calibration
ShiBaZi5642013–2020181750calibration
FeiXian36592013–202028157257calibration
FenShi9232013–2020242769calibration
ZhaiQian3922015–202081131calibration
JingTouJiang1732014–20194914calibration
ShenShanTou29302014–2019571227calibration
CaoLongTan3502013–20156722calibration
HeTan4452014–202071334calibration
HengBanQiao402014–20201352calibration
HuaQiao812013–20202265calibration
MaXiPu3422012–202024425calibration
ShanXiQiao12112013–2020122482calibration
LianYuan1541979–2020391711calibration
LouDi15562014–20201758112calibration
HongYan7112014–201981755calibration
HuangQiao26892012–20191476211calibration
SheBu14342013–2020938109calibration
MaoPing21142014–2020954163calibration
HongYanXi1902012–202020411calibration
DaLuPu6352013–2020291847calibration
HaoFu4402013–2020261035calibration
LanShan3052013–2020322519calibration
JiaYi14752013–2020163296calibration
LiangShuiKou8652012–2020181765calibration
LuoLingQiao3402012–2020162130verification
JiaHe15012012–20203158103verification
CaoJiTan3872013–202013930verification
ShuangFeng15522014–20201036115verification
DongKou9282013–2020131866verification
JiShou7882012–2020263056verification
ZhuXiPo6992013–2020151853verification
ShuangFengTan4442013–2020122035verification
Table 2. Physical meanings and units of model parameters.
Table 2. Physical meanings and units of model parameters.
ParameterPhysical DescriptionUnitParam Range
1KRatio of potential evapotranspiration to pan evaporation[-]0.5–1.2
2UmAveraged soil moisture storage capacity of the upper layer[mm]10–40
3LmAveraged soil moisture storage capacity of the lower layer[mm]50–90
4DmAveraged soil moisture storage capacity of the deep layer[mm]10–80
5CCoefficient of the deep layer that depends on the proportion of the basin area covered by vegetation with deep roots[-]0.1–0.3
6BExponential parameter with a single parabolic curve, which represents the non-uniformity of the spatial distribution of the soil moisture storage capacity over the catchment[-]0.1–0.9
7ImPercentage of impervious and saturated areas in the catchment[-]0.0–1.0
8SmAreal mean free water capacity of the surface soil layer, which represents the maximum possible deficit of free water storage[mm]10–80
9ExExponent of the free water capacity curve influencing the development of the saturated area[-]0.1–2.0
10KgOutflow coefficients of the free water storage to groundwater relationships[-]0.1–0.5
11KiOutflow coefficients of the free water storage to interflow relationships[-]0.1–0.5
12CiRecession constants of the lower interflow storage[-]0.1–0.99
13CgRecession constants of the groundwater storage[-]0.5–0.999
Table 3. Abbreviation of parameter regionalization methods.
Table 3. Abbreviation of parameter regionalization methods.
AbbreviationMethod NameAbbreviationMethod Name
SDshortest distanceGANgenerative adversarial networks
ASattribute similarityCARTclassification and regression tree
SVRsupport vector regressionRFrandom forest
Table 4. Simulation results of calibration.
Table 4. Simulation results of calibration.
StationNSCEStationNSCEStationNSCE
NingXiang0.78HengBanQiao0.80MaoPing0.77
ShiBaZi0.76HuaQiao0.61SheBu0.83
FeiXian0.79MaXiPu0.72HongYanXi0.83
FenShi0.84ShanXiQiao0.79DaLuPu0.82
ZhaiQian0.77LianYuan0.86HaoFu0.80
JingTouJiang0.87LouDi0.78LanShan0.67
ShenShanTou0.83HongYan0.74JiaYi0.86
CaoLongTan0.85HuangQiao0.74LiangShuiKou0.87
HeTan0.79
Table 5. Information of typical watershed characteristics.
Table 5. Information of typical watershed characteristics.
Station NameCentroid CoordinatesBasic AttributesLand Use (%)Soil Type (%)
lonlatA (km2)PE (m)H (m)Lu1u2u3u4s1s2s3
NingXiang112.248728.097222500.160167.36145.360.20849.81.942.45.944.355.50.2
ShiBaZi112.362528.01535640.125113.5777.570.28945.21.449.63.842.357.70.0
FeiXian112.296325.663435590.213395.16261.160.24449.85.538.36.418.374.96.8
FenShi112.563625.27769230.241500.12287.120.38059.45.029.06.631.568.20.3
ZhaiQian113.935026.06673920.3451140.59428.590.62987.85.22.74.316.883.20.0
JingTouJiang112.079626.93991730.181227.96118.961.11357.71.738.91.764.135.90.0
ShenShanTou112.219627.091129300.176175.06134.060.13149.01.645.93.547.053.00.0
CaoLongTan110.417328.83833500.457533.01434.010.09395.30.23.60.931.968.10.0
HeTan109.129027.18544450.372616.32274.320.39184.44.010.11.590.19.90.0
HengBanQiao110.526727.3562310.323757.67274.670.42489.51.28.50.861.138.90.0
HuaQiao110.203827.6833810.279500.44314.442.24986.60.811.70.986.513.50.0
MaXiPu110.449228.24163420.383386.66297.660.11988.60.89.31.387.712.20.1
ShanXiQiao110.598227.535812110.342803.06651.060.17386.63.68.31.564.036.00.0
LianYuan111.601527.63351540.229248.27128.270.50854.12.736.86.459.041.00.0
LouDi111.762727.825715560.236312.35239.350.22052.64.635.37.561.936.02.1
HongYan110.366426.81927110.294634.08303.080.30278.71.118.51.766.433.60.0
HuangQiao110.563926.779526890.226515.43274.430.26858.31.437.13.230.569.30.2
SheBu111.625627.167721140.155322.56137.560.23338.13.951.36.742.957.10.0
MaoPing112.598227.449814340.164154.98122.980.29858.90.838.02.323.176.90.0
HongYanXi109.595429.34041900.367689.85322.850.39384.21.413.90.542.857.20.0
DaLuPu111.540824.87286350.232487.37269.370.22354.05.335.65.118.880.40.8
HaoFu111.405825.70954400.328565.82365.820.52877.51.719.11.758.441.60.0
LanShan112.147925.26403050.340675.70428.700.35282.71.212.04.154.845.20.0
JiaYi113.967428.777314750.274316.37248.370.27677.52.517.03.044.655.40.0
LiangShuiKou110.025529.69158650.471778.04496.040.33894.20.25.40.232.367.70.0
LuoLingQiao113.372728.53413400.170113.0777.070.65265.21.229.93.770.030.00.0
JiaHe112.265625.368515010.263511.87337.870.27666.22.726.15.036.854.38.9
CaoJiTan113.113726.38203870.197179.5090.500.35263.21.532.52.850.744.64.7
ShuangFeng112.050827.401615520.164175.11118.110.30140.42.951.65.120.579.50.0
DongKou110.449127.15769280.362756.60458.600.45493.50.84.90.856.543.50.0
JiShou109.549728.32037880.356621.50453.500.20881.53.014.01.558.341.70.0
ZhuXiPo111.694128.14906990.353422.86310.860.45081.32.114.42.290.08.31.7
ShuangFengTan110.595429.37424440.360591.95446.950.25787.00.611.21.266.132.51.4
Table 6. Transplant results of SD and AS methods.
Table 6. Transplant results of SD and AS methods.
Station NameSDAS
Transplant
Station
NSCETransplant
Station
NSCE
LuoLingQiaoJiaYi0.66LianYuan0.86
JiaHeLanShan0.69JiaYi0.31
CaoJiTanZhaiQian0.27LianYuan0.61
ShuangFengShenShanTou0.75MaoPing0.74
DongKouHengBanQiao0.29HaoFu0.72
JiShouMaXiPu0.71ShanXiQiao0.46
ZhuXiPoLouDi0.60HeTan0.72
ShuangFengTanCaoLongTan0.64HongYan0.69
Average Value 0.58 0.64
Table 7. Transplant results of machine learning methods.
Table 7. Transplant results of machine learning methods.
Station NameNSCE
SVRGANCARTRF
LuoLingQiao0.750.710.650.78
JiaHe0.550.690.510.65
CaoJiTan0.420.430.610.64
ShuangFeng0.250.570.350.64
DongKou0.800.780.800.65
JiShou0.750.710.710.75
ZhuXiPo0.520.640.570.59
ShuangFengTan0.790.780.660.72
Average Value0.600.660.610.68
Table 8. Information on basin attributes.
Table 8. Information on basin attributes.
Station NameCaoJiTanZhaiQianDongKouHengBanQiaoJiaHeJiaYiJiShouShanXiQiao
Basin AttributesArea (km2)38739292831150114757881211
Average Slope0.1970.3450.3620.3230.2630.2740.3560.342
Average Elevation (m)179.51140.59756.6757.67511.87316.4621.5803.1
Average Elevation Drop (m)90.5428.59458.6274.67337.87248.4453.5651.1
Shape Coefficient0.3520.6290.4540.4240.2760.2760.2080.173
Land Use (%)Forest63.287.893.589.566.277.581.586.6
Grass1.55.20.81.22.72.533.6
Cultivated Land32.52.74.98.526.117148.3
Other2.84.30.80.8531.51.5
Soil (%)Clay50.716.856.561.136.844.658.364
Silt44.683.243.538.954.355.441.736
Sand4.70008.9000
Table 9. Comparison of parameter regionalization schemes.
Table 9. Comparison of parameter regionalization schemes.
ItemsNSCE
SDARSVRGANCARTRF
Best0.750.860.80 0.780.800.78
Worst0.270.310.250.43 0.35 0.59
Average0.580.640.60 0.66 0.61 0.68
Table 10. NSCE statistical results.
Table 10. NSCE statistical results.
NSCESDARSVRGANCARTRF
≥0.9000 000
0.7–0.925%50%50%50%25% 37.5%
0.5–0.750%25%25% 37.5% 62.5%62.5%
<0.525%25%25% 12.5% 12.5%0
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, W.; Zhao, Y.; Tu, Y.; Dong, R.; Ma, Q.; Liu, C. Research on Parameter Regionalization of Distributed Hydrological Model Based on Machine Learning. Water 2023, 15, 518. https://doi.org/10.3390/w15030518

AMA Style

Wang W, Zhao Y, Tu Y, Dong R, Ma Q, Liu C. Research on Parameter Regionalization of Distributed Hydrological Model Based on Machine Learning. Water. 2023; 15(3):518. https://doi.org/10.3390/w15030518

Chicago/Turabian Style

Wang, Wenchuan, Yanwei Zhao, Yong Tu, Rui Dong, Qiang Ma, and Changjun Liu. 2023. "Research on Parameter Regionalization of Distributed Hydrological Model Based on Machine Learning" Water 15, no. 3: 518. https://doi.org/10.3390/w15030518

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop