Next Article in Journal
An Enhanced Flume Testing Procedure for the Study of Rill Erosion
Previous Article in Journal
Assessment of Human Stability in Sewer Systems during Dry Weather Conditions
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Hierarchical Clustering for Paired Watershed Experiments: Case Study in Southeastern Arizona, U.S.A.

1
U.S. Geological Survey, Western Geographic Science Center, Tucson, AZ 85719, USA
2
Borderlands Restoration Network, Patagonia, AZ 85624, USA
3
Biophilia Foundation, Chester, MD 21619, USA
*
Author to whom correspondence should be addressed.
Water 2021, 13(21), 2955; https://doi.org/10.3390/w13212955
Submission received: 20 September 2021 / Revised: 8 October 2021 / Accepted: 11 October 2021 / Published: 20 October 2021
(This article belongs to the Section Ecohydrology)

Abstract

:
Watershed studies are often onerous due to a lack of data available to portray baseline conditions with which to compare results of monitoring environmental effects. A paired-watershed approach is often adopted to simulate baseline conditions in an adjacent watershed that can be comparable but assumes there is a quantifiable relationship between the control and treated watersheds. Finding suitably matched pairs that can most accurately depict similar responses is challenging and attributes are rarely quantified. In southeastern Arizona, United States, researchers are investigating the effectiveness of watershed restoration techniques employed by land managers. We selected Smith Canyon to develop a rigorous and quantitatively defensible paired-watershed experimental design. The Smith Canyon watershed consists of 91 structurally similar sub-basins that have a defined basin-like structure and flow channel, allowing for consideration as replicate units. We developed a statistical approach to group sub-basins based on similar structural, biophysical, and hydrologic traits. Our geospatial database consisted of 35 environmental variables, which we reduced to 12 through a correlation analysis. We identified three primary collections of paired sub-basins within the larger watershed. These clusters are being used to inform studies actively being employed in the watershed. Overall, we propose a hierarchical clustering protocol for justification of watershed pairing experiments.

1. Introduction

Paired watershed studies are implemented globally to portray the effects of land management practices on the hydrologic budget, water quality, and the fate and transport of nonpoint source pollutants [1,2,3,4,5,6]. The basic concept uses two neighboring watersheds, one as a “control” and another as a “treatment,” to monitor and analyze changes associated with some management action. Ideally the paired set are concurrently monitored during a pre-treatment phase so that a statistically significant relationship between them can be established (a ‘calibration period’). Then, some ‘treatment’ is applied (i.e., denuding the landscape of one watershed), such that any resulting changes detected in the established relationship can be attributed to the effects of the ‘treatment’ [7]. The integration of paired catchments with a before-after-control-impact (BACI) design elucidates more accurate results [8].
The paired watershed approach assumes a consistent and quantifiable relationship between the watersheds’ response variables. The absence of perfect natural replica complicates the direct comparison of watersheds due to potential differences in size, elevation, soil type, geology, vegetation, rainfall, topography, geomorphology, etc., and analysis between the pairs assumes that the two watersheds will respond in a predictable manner [2,9]. Even when paired watersheds are nearly a perfect match, the non-stationarity of climate and hydrology can complicate analyses, especially when the calibration periods are lacking [7,10,11,12,13].
However, the influence of land management on water supplies has been long documented using the paired-watershed approach and findings have created a foundation to our understanding of hydrologic processes and the effects of management [5,8,9,14,15,16,17,18,19,20]. Paired watershed analyses have also illuminated the impacts of natural disturbances (i.e., wildfire, insect outbreaks) [8]. An alternative method to detect changes, to simulate management action and/or extend field observations in paired watershed studies, is through the application of hydrologic models that analyze observed and simulated flows [21,22,23].
Many variables can impact the hydrologic cycle and overall function of a watershed, including a combination of soil properties, vegetation composition and condition, and structural components (i.e., area, size, slope, aspect) [9,24,25]. We posit that a robust understanding of all these variables or traits is fundamental to ensure similarities in the function of paired watersheds.
Clustering techniques, in general, have long been used as a method to identify structural groupings of objects within datasets [26]; and have been used for the “regionalization of watersheds” to identify watersheds that have similar traits [27], including in Arizona [28], New Zealand [29], and more broadly across the southeastern United States [30]. Hierarchical clustering is a clustering technique that is used to cluster objects into a dendrogram, or data tree, either through an agglomerative (i.e., bottom-up) or partitional (top-down) approach [26,31].
Our overarching goal is to develop and apply a replicable and statistically based methodological approach for the pairing of watersheds based on a spatial variable profile. In this paper, we discuss the following objectives: (i) developing a robust and comprehensive spatial database consisting of multiple layers portraying various aspects of the landscape, (ii) using a mixed quantitative and qualitative approach to identify spatial variables within the spatial database that have an independent influence on surface water dynamics and represent overarching landscape themes, and (iii) pairing similarly structured watersheds that can be used to monitor the effects of land management practices. Overall, this statistically based paired-watershed approach can help identify similarities in watershed function and can inform semi-arid watershed restoration.

2. Materials and Methods

2.1. Study Area

We selected a watershed that consists of multiple sub-basins to allow for the development of this experimental design. Smith Canyon watershed (SCW), located southeast of Tucson, Arizona, U.S.A., is a feather-like landscape that consists of 225 total sub-basins with water flow moving north to south (Figure 1). Three separate forks are present within the upper portion of the watershed, which merge into a single primary flowline. We selected 91 of the sub-basins for potential analysis because of their superficially structurally similar appearance and presence of a defined central flow channel.
Located within the Nogales Ranger District of the Coronado National Forest (CNF), Smith Canyon is a tributary of Sonoita Creek, within the Sonoita Creek 5th Hydrologic Unit Code (HUC), which flows southwest through the town of Patagonia and into the Santa Cruz River. Located within the Madrean Archipelago Level III Ecoregion [32], this watershed experiences a hot and dry climate and receives precipitation during the summer months, with more extreme monsoonal rainfall events, and winter storms from the Pacific Ocean, which generally provide longer but less intense storms [33,34]. Data from Patagonia Lake State Park, in the vicinity of SCW, show that the region averages ~480 mm of precipitation per year, with ~56% occurring during the summer monsoon season [35]. High and low temperatures average between 33.9 °C and 19.3 °C during the summer (June through September), and between 15.2 °C and −1.4 °C during the winter (November through February), respectively [35]. As a geographically closed watershed ranging in elevation from 1287 m to 1577 m, ephemeral water flows within SCW are dependent on rainfall events and occasional low-elevation winter snowfall events.
The vegetation composition of SCW is primarily a mixed mesquite (Prosopis spp.) shrubland and grassland in the upper portions of the sub-canyons with riparian woodland vegetation, such as Desert Willow (Chilopsis linearis), Soapberry (Sapindus saponaria), and Arizona Sycamore (Platanus wrightii), present along the primary flowlines and a limited number of the sub-basins. Cattle have actively grazed within portions of the watershed, increasing soil compaction and reducing the amount of herbaceous vegetation within the watershed. Many of the sub-basins have well defined gullies along the central portion of the basin as well as distinct head-cuts at the upper end of the gullies.

2.2. Spatial Database Development

We constructed a comprehensive set of spatial layers portraying various aspects of the watershed. Geographic boundaries and hydrologic metrics were developed using the Soil & Water Assessment Tool (SWAT). SWAT is a physically based, continuous time, watershed modeling program developed by the Agricultural Research Service of the U.S. Department of Agriculture to predict yields of water, sediment, nutrients, and agricultural chemicals in large watersheds with diverse soils and land management practices [36,37,38,39]. We included the three following inputs in SWAT: (i) a land use/land cover classification, (ii) a soil profile, and (iii) an elevation layer; though, numerous other biophysical and structural variables can be used. The land use/land cover (LULC) classification was developed to provide a localized and simplified classification structure, similar in classification structure to the National Land Cover Database (NLDC) [40] and to be useable within SWAT and other watershed models [41]. For the soil input, we used the Soil Survey Geographic (SSURGO) soil database for southeast Arizona was developed by the U.S. Department of Agriculture [42]. Finally, we included a sub-3 m Digital Elevation Model (DEM) derived from airborne LiDAR (Tyson Swetnam—University of Arizona, oral communication, 25 January 2018).
Geographic boundaries for the watershed and sub-basins (n = 225) are derived in SWAT using the DEM layer. We selected 91 of the sub-basins for analysis that consisted of the following characteristics: (i) a visible canyon-like structure and (ii) a derived flowline extending from within the sub-basin to one of the primary flowlines along the base of Smith Canyon. All analyses completed in this study use data from only the 91 selected sub-basins.
In addition, SWAT produces a collection of modeled hydrologic metrics for each defined sub-basin using the full suite of inputs, defined in Table 1 [43]. Each metric provides a relative measurement, by sub-basin, of a specific hydrologic function per unit time and can be used to represent hydrologic traits within the hierarchical clustering analysis. Additional derivative LULC and elevation byproducts from SWAT were adapted as structural and biophysical variables and are described below.
We also collected a combination of spatially explicit structural (n = 16) and biophysical (n = 12) variables for Smith Canyon watershed (Table 2). The structural variables were based on the following themes: (i) elevation, (ii) aspect, (iii) slope, and (iv) general structural traits; all the structural variables use the sub-3 m DEM as source data. Elevation statistics, including mean, minimum, and maximum elevation, were extracted from the DEM for each sub-basin using the “Zonal Statistics as Table” tool in ArcGIS 10.5. Aspect was developed using the DEM within the “Aspect” tool in ArcGIS. The resulting aspect image was then resampled by converting degrees values into four distinct classes: (i) north facing (316° to 360°; 0° to 45°), (ii) east facing (46° to 135°), (iii) south facing (136° to 225°), and (iv) west facing (226° to 315°). Area for each aspect class is converted into a percentage value for each sub-basin.
For slope, SWAT calculated an area for three different slope classes as a byproduct. Using the DEM, we determined the percentage value ranges for the slope classes to be: (i) low slope—0% to 18.44% or 0° to 10.45°, (ii) moderate slope—18.44% to 45.58% or 10.45° to 24.5°, and (iii) steep slope—greater than 45.58% or greater than 24.5°. The slope division designations are relative. Similar to the aspect classes, the area is converted into a percentage of area value for each sub-basin. General structural traits for each sub-basin, including: (i) sub-basin slope, (ii) width (width of the stream), (iii) length (length of the longest reach from the upper to lower portion of the sub-basin), (iv) reach length (length of the derived reach), (v) perimeter, and (vi) area, were all byproducts of SWAT [38].
To comprise the set of biophysical variables, we included: (i) different LULC classifications for the region—each intended to map certain aspects of the landscape, (ii) the SSURGO soil classification, and (iii) satellite measured vegetation conditions. The source data used for the LULC classifications and satellite measured vegetation conditions have a spatial resolution of 30 m, while the SSURGO soil classification is a high-resolution vector layer. Vegetation composition is directly linked to aspects of watershed function including groundwater recharge and evapotranspiration, discharge and water flows, as well as geologic and geomorphic properties [44,45,46]. We included two LULC products to account for the role of vegetation within the sub-basins of Smith Canyon. First, the translated land use derivative byproduct developed in SWAT [47], based on Villarreal et al. (2011) [41], may be useful for clustering purposes. Four classes are present within SCW, however, only two have enough coverage within the sub-basins to be considered within the hierarchical clustering analysis. The second LULC product was an existing vegetation designation classification for the Santa Cruz River watershed [48]. Eleven classes of this classification were found within the boundary of the SCW, though only four had notable coverage within the watershed. For these classifications, we quantified the percent coverage for all sub-basins and extracted the data using ArcGIS.
Soil composition can have a direct impact on surface water flow dynamics [24,49], therefore, it was necessary to include a soil layer. Three SSURGO soil classes were present within the SCW: (i) Class 1421630—Caralampi gravelly sandy loam (10 to 40% slopes), (ii) Class 1421631—Caralampi gravelly sandy loam (10 to 60% slopes, eroded), and (iii) Class 801772—Comoro soils (0 to 5% slopes) [42,50]. However, there was limited coverage of the Comoro soil class (801772) within the selected sub-basins, resulting in this class being excluded from the clustering analysis. We converted area of each soil class for each sub-basin into a percentage value.
Remotely sensed vegetation indices and greenness measurements have been shown to be representative of watershed condition and function within the Madrean Ecoregion [51,52,53,54] and can help identify similarly vegetated sub-basins. Therefore, we included an assessment of vegetation greenness using the Normalized Difference Vegetation Index (NDVI) [55] over time. The NDVI is a ratio of the Red and Near Infrared (NIR) bands to provide a quantification of relative vegetation greenness based on satellite measured reflection of electromagnetic radiation. The NDVI Equation (1) follows:
NDVI = NIR   band Red   band NIR   band + Red   band
where NDVI is the Normalized Difference Vegetation Index and NIR is the Near Infrared band. NDVI ranges from −1 to 1, where values closer to 1 represent more dense green vegetation and values closer to 0 indicate less greenness, or barren soil and rock; values between −1 and 0 generally correspond with water because of its reflective properties. In Google Earth Engine (GEE) [56], we selected Landsat 8 Operational Land Imager (OLI) images from two overlapping scenes (Path 35 and 36; Row 38) for the pre-monsoon period of stressed vegetation (mid-June) from 2014 through 2017, equating to one image per year. This period provides a baseline of vegetation health and is generally more consistent from year-to-year because of the dry conditions. We stacked the NDVI images from each year and produced a four-year mean (2014 to 2017) NDVI image, where each pixel is averaged across the time-period. Using the resulting mean NDVI image, we quantified and extracted various statistics for each sub-basin, including minimum, maximum, mean, and standard deviation, for each sub-basin using the “Zonal Statistics as Table” tool in ArcGIS.

2.3. Input/Output Correlations

Highly correlated variables or variable derivatives can artificially increase the weight of the respective variable within a hierarchical clustering analysis—an effect referred to as “double bumping”. Because the hydrologic metrics derived in SWAT are based on biophysical and structural variables included in the comprehensive spatial database, this effect may be present.
Using the “cor” function in R, we calculated a Pearson correlation coefficient to test the relationship between the modeled hydrologic metrics with structural and biophysical variables within the spatial database that are representative of the input dataset for each sub-basin. For instance, elevation (i.e., mean, minimum, maximum) and slope (i.e., percent low, moderate, steep) measurements for each sub-basin are representative of the DEM and were considered as structural variables. Similarly, the LULC derivatives (i.e., percent RNGB, RNGE) and the SSURGO soil layer classes (i.e., percent 1421630, 1421631) were both inputs in SWAT and were used as biophysical variables. Strong and significant correlations, positive or negative, between the hydrologic metrics and representative structural and biophysical variables would suggest possible effects of double bumping.

2.4. Environmental Variable Analysis

We completed a statistically based environmental variable analysis in R version 4.0.2 [57] to reduce the dimensionality of the spatial database used to develop the clustering analysis. Although a statistical quantification of the overall interaction between the variables can identify relationships, the final variable selection was also qualitative and considered the real-world effect each variable may have in impacting the dynamics of surface water flows. Additionally, the value of each of the broad themes within the spatial database development (i.e., elevation, aspect, slope, LULC, soils, vegetation condition, hydrology) was considered.
We calculated a Pearson correlation coefficient between each of the variables using the “cor” function in R to determine the relative strength and direction of the relationship. Ideally, highly correlated variables that explain similar processes would be removed to reduce the dimensionality of the spatial database. Spatial variables within the same thematic class (i.e., slope, aspect, elevation, etc.) were initially considered, before comparing to the full suite of variables. We considered and reduced the amount of structural, biophysical, and hydrologic variables separately, before combining the selected variables and completing an additional correlation analysis. Only significant (p-value < 0.05) correlations were considered in this analysis.

2.5. Hierarchical Clustering Analysis

We completed the hierarchical clustering analysis and visualization of results using the factoextra [58] and ggplot2 [59] packages in R. The elbow and silhouette methods (“fviz nbclust” function) are often used to determine the optimal number of clusters for the sub-basins. The elbow method is regularly applied to determine the number of clusters based on variance in the spatial variables, where each additional cluster explains less variance and the optimal number of clusters is the point at which variance begins to diminish [60]. The silhouette method tests for general closeness and similarity between objects within a cluster [61]. We also tested for clustering tendency of the various datasets using the Hopkins statistic (“get_clust_tendency” function); which ranges from a value of 0 to 1, where 0 to 0.3 suggests a uniformly distributed dataset, 0.5 indicates a randomly distributed dataset, and 0.7 to 1 implies the data are clustered [62]. Using the determined number of clusters as a constraint, we used the “eclust” function to produce two hierarchical clusters based on combinations of the selected structural, biophysical, and hydrologic variables identified in the environmental variable analysis [63]. This approach is an agglomerative clustering technique that develops single clusters and pairs them with similar clusters to develop a larger dendrogram.
Lastly, we aim to classify distinct geographic areas that have been similarly clustered by both clusters and may have associated structural, biophysical, and hydrologic traits. To identify geographically paired areas, we use ArcGIS to group the two hierarchical clusters and visually assess their overlap. We then calculate the mean value of the selected spatial variables for each geographic pairing to determine if they have identifiable spatial characteristics.
The scripts used to apply this methodological approach as well as a vector shapefile of the selected sub-basins and their respective spatial variable attributes are published within a separate data release in the U.S. Geological Survey Science Base data repository [64].

3. Results

3.1. Correlations between SWAT Input and Output Hydrologic Metrics

Overall, 44 of the 70 (63%) correlations between the variables representing inputs for SWAT and the modeled hydrologic metrics for the selected sub-basins were significant (p < 0.05), suggesting a relatively direct relationship between the input variables and output products (Table 3). For the structural variables (Table 3a), the elevation traits (i.e., minimum, maximum, mean) were significantly correlated with all the hydrologic metrics, except for surface runoff, suggesting that elevation is directly linked to the hydrologic metrics. Similarly, percent low slope was significantly correlated with all the hydrologic metrics. Percent moderate slope was significantly correlated with only two of the six metrics (i.e., potential evapotranspiration, percolation), qualifying as the least correlated structural variable. For the biophysical variables (Table 3b), fewer significant correlations were documented. However, percent soil class 1421630 was significantly correlated with all hydrologic metrics, except for surface runoff, while percent soil class 1421631 was only significantly correlated to potential evapotranspiration and soil water content. Finally, the LULC derivatives (i.e., percent RNGB and RNGE) were only significantly correlated to potential evapotranspiration, soil water content, and surface runoff.
In general, these results suggest that the modeled hydrologic metrics may be more tied to structural traits, though significant correlations with the biophysical variables were still present. Careful consideration is necessary in selecting hydrologic metrics for the hierarchical clustering analysis to reduce effects of double bumping.

3.2. Variable Correlations and Selection

3.2.1. Structural Variables

Several strong positive and negative correlations were present within the structural variables (Figure 2). For instance, significant and strong positive correlations—all greater than or equal to 0.89—were measured between the general structural trait derivatives from SWAT (i.e., length, width, area, perimeter, reach length). Additionally, the elevation traits (i.e., minimum, maximum, mean elevation) were strongly and positively correlated, with all values greater than or equal to 0.96. For the percent aspect traits (i.e., percent north, east, south, west facing), strong positive and negative correlations were also measured. Finally, strong negative correlations were observed in the percent of slope traits (i.e., low, moderate, steep slope), though the correlation between low and moderate slopes was not strong.
Based on these relationships and a qualitative review considering the real-world effects of these variables on surface water dynamics, we reduced the final structural variables to include maximum elevation, area, length, sub-basin slope, percent east facing, percent north facing, percent steep slopes, and percent low slopes. Area and length define the general shape and structure of each sub-basin. North and east facing slopes were positively correlated, as were south and west facing slopes. However, north facing slopes are generally more densely vegetated, which can affect surface water flows. Percent steep and low slopes impact surface water flows in contrasting ways, while sub-basin slope provides a general representation of slope for each sub-basin. Finally, maximum elevation categorizes the sub-basins within the upper and lower portions of the watershed.

3.2.2. Biophysical Variables

Similarly, results showed mixed positive and negative correlations between the biophysical variables, though overall correlation values were generally weaker compared to the structural variables (Figure 3). The NDVI traits showing properties of the satellite-measured vegetation condition were relatively intercorrelated; the only negative correlation within the NDVI variables was between standard deviation and minimum NDVI. The land use derivative traits (i.e., percent RNGB and RNGE) were also highly correlated to each other, while the RNGB class was correlated to the two soil classes. Finally, the soil classes (i.e., percent 1426130 and 1426131) were more strongly correlated to each other as well as many of the other LULC class, particularly percent 45v and 52v.
Based on these correlations and considering the spatial spread of these variables across SCW and the overall impact of these variables on surface water dynamics, we reduced the biophysical variables to include seven variables. First, we included standard deviation and minimum NDVI to address overall vegetation variability and baseline conditions, respectively. We also included two vegetation classes, percent 45v and 52v. These classes were selected because they are found within a greater portion of the sub-basins and 56v is highly correlated to the NDVI traits, though not significantly with standard deviation NDVI. We selected the RNGB land use derivative classification because of the greater presence of this class within the sub-basins and because it was not significantly correlated with the other selected vegetation variables (i.e., percent 45v and 52v). Finally, we included both soil classes because of the geographic distribution of soil presence within the watershed and overall importance of soil composition on watershed function.

3.2.3. Hydrologic Metrics

The hydrologic metrics were strongly correlated overall (Figure 4). For instance, the correlation values between evapotranspiration, percolation, water yield, and lateral flow were all greater than ±0.86, suggesting strong direct relationships. Additionally, potential evapotranspiration and surface runoff were significantly correlated to nearly all the hydrologic metrics, though not to each other. In contrast, soil water content was only significantly correlated with surface runoff and potential evapotranspiration.
To reduce the dimensionality of the hydrologic metrics, we selected the three variables that were less correlated and may better depict aspects of surface water flow. First, soil water content was the least correlated metric and represents the amount of water within the soil of each sub-basin. Second, surface runoff is a direct portrayal of the amount of surface water flow and contribution to overall streamflow. Though surface runoff was significantly correlated to five of the six metrics, the correlation values were generally the weakest, suggesting a less direct relationship. Finally, water yield, which signifies the net water loss that contributes to streamflow, was highly correlated to lateral flow and evapotranspiration, though not soil water content, and had a weak negative correlation with surface runoff.

3.2.4. Combined Variables

Additional strong and significant correlations were observed between the selected structural, biophysical, and hydrologic variables (Figure 5a), thus necessitating a further reduction in the dimensionality. Maximum elevation was highly and significantly correlated to many of the biophysical variables, suggesting a strong relationship between the location of the sub-basin within SCW and the biophysical traits, particularly the soil and vegetation classes (i.e., 1421630, 1421631, 45v, 52v). Therefore, we removed maximum elevation. Similarly, we removed percent low slope because of significant correlations with the biophysical variables (i.e., NDVI minimum, 45v, 1421630) as well as sub-basin and percent steep slopes. Significant correlations were also present between low slope and the three selected hydrologic metrics. We also removed percent east slope and the vegetation classes (i.e., 45v, 42v). Though length and area were significantly correlated to many of the other variables, they provide important characterizations of the structure and primary flowline of the sub-basin and correlations were generally weak. They are also not significantly correlated to the three hydrologic metrics. We choose to include both length and area because—although effectively directly correlated to each other—variability in length of the sub-basin will better identify the distance water flows within a channel while variability in area may better suggest the amount of water collected within each sub-basin following a precipitation event—both of which can influence applications of restoration and on-the-ground restoration decisions. Surface runoff was the least correlated hydrologic metric, overall, though some significant correlations were measured. Soil water content was generally significantly correlated with the biophysical variables.
In full, we selected ten combined biophysical and structural variables and two hydrologic metrics based on these results (Figure 5b). Each selected biophysical and structural variable likely has a distinctive influence on water flows within the watershed and has a more widespread geographic distribution across the larger watershed. Similarly, the selected hydrologic metrics provide a portrayal of above- and sub-surface water dynamics. Correlations show that effects from double bumping were likely minimal when including the selected hydrologic metrics.

3.3. Variable Selection

Through the mixed qualitative and quantitative review, we determined that a more holistic approach with the full suite of ten biophysical and structural variables and two hydrologic metrics would produce effective and representative hierarchical clusters within Smith Canyon (Figure 6). Additionally, the full suite of variables represents the larger themes collected within the development of the spatial database, including slope, aspect, LULC, vegetation conditions, structural profile, soil, and hydrology.
Each of the selected variables have discernable geographic profiles (Figure 6, a through l). Percent steep slope (6a) and sub-basin slope (6b) have similar profiles—which was expected because of their high correlation (Figure 2)—where steeper slopes were present largely in the middle of the watershed. Percent north facing slope (6c) had higher percentage values at the sub-basins on the southwest side of the main channel. Area and length (6e–f) identified the longest and largest sub-basins along the northeast side of the main channel with the smaller sub-basins with shorter flowlines located generally along the north, middle, and south forks as well as along the southwest side of the watershed. The percent soil class 1421630 and 1421631 profiles (6g–h) showed contrasting coverages within the sub-basins, confirmed by a strong negative correlation. Higher minimum NDVI values (6i) were present in the southern portion of the watershed, while the largest sub-basins generally had lower minimum NDVI values. Greater NDVI standard deviation (6j) was present within the larger sub-basins as well as sub-basins located along the southwest side of the watershed, contrasting with the minimum NDVI profile. Finally, surface runoff (6k) identified isolated and specific sub-basins with greater values, while water yield (6l) had relatively high values throughout the watershed.

3.4. Primary and Secondary Clusters

We produced two hierarchical clusters: (i) a primary cluster using only the selected biophysical and structural variables and (ii) a secondary cluster using the selected hydrologic metrics in addition to the selected biophysical and structural variables. This decision was made to isolate the effect of the hydrologic metrics on watershed pairing. The Hopkins statistic had a value of 0.73 for the primary cluster and 0.72 for the secondary cluster, suggesting a similar tendency for clustering for both collections of spatial variables. For both clusters, results from the elbow and silhouette methods were inconclusive and identified a different number of clusters (i.e., elbow = 5; silhouette = 2). When considering these results and to allow for an appropriate number of sub-basin pairs for on-the-ground applications, we determined that both clusters would consist of three main clades—visualized using dendrograms (Figure 7). Hereafter, we refer to the dendrogram as a cluster and the extending main branches as clades. The paired watersheds, on various sub-hierarchies, are identified within each clade by their sub-basin identification number.
For the primary cluster (Figure 7a), Clade 1 (n = 28) is primarily located within the lower half of SCW along the main channel (Figure 8a). Clade 2 (n = 34) is primarily found along the southern side of the main channel as well as along the lower portion of north, middle, and south forks. Clade 3 (n = 29) is located at the upper portion of the SCW along the north, middle, and south forks. For the secondary cluster (7b), Clade 1 is the largest selection of paired watersheds (n = 42) and is located primarily throughout the middle of the watershed (8b). Clade 2 (n = 31) and Clade 3 (n = 18) are located generally within the northern and southern sections of SCW, respectively.

3.5. Geographically Paired Clusters

Four separate geographic areas within SCW were identified based on their geographic location and their classification within the primary and secondary clusters (Figure 8c); these areas have individual structural, biophysical, and hydrologic traits. First, the northern sub-basins (n = 29), located largely along the middle and south forks, were similarly clustered geographically despite being listed as different clades (i.e., Clade 3—primary cluster, Clade 2—secondary cluster). On average, these sub-basins have a have highest percentage of soil class 1421630, though have moderate values for the other spatial layers (Table 4). Second, two general areas along the east and west sides of the main channel were similarly identified as being clustered. The east-central sub-basins (n = 13) were identified as Clade 1 within both the primary and secondary clusters. These sub-basins are the largest, identified by the highest mean length and area values, but have the lowest percentage of steep slopes and lowest mean minimum NDVI value. Similarly, sub-basins along the western side (n = 29) of the main channel were also geographically paired (i.e., Clade 2—primary cluster, Clade 1—secondary cluster). However, these sub-basins represent contrasting clades within the smaller clusters and are defined having the smallest area, the largest percentage of steep and north facing slopes, the highest average slope, and lowest surface runoff. Finally, the southern sub-basins (n = 15), identified as Clade 2 in the secondary cluster and Clade 1 in the primary cluster, are located largely at the lower end of SCW along the main channel. These sub-basins have the lowest average slopes and presence of steep slopes as well as the highest NDVI values on average. Additionally, they average the least amount of water yield and have the highest runoff—which are only considered in the secondary cluster likely resulting in their identification. The remaining sub-basins (n = 5) were not geographically clustered and mean values were not presented.

4. Discussion

4.1. Development of the Spatial Database

In short, we developed and applied a novel and relatively simple methodological approach that allows for replicability within varying locations and can be used to support paired watershed applications based on statistical results.
Though this application pairs watersheds effectively based on a robust collection of structural, biophysical, and hydrologic traits, it was not constrained by the geographic location of the sub-basins. If necessary, geographic coordinates can be considered to more appropriately cluster and pair non-adjacent watersheds. Additionally, this approach does not define the extent to which the watersheds are paired, but rather identifies the most similar watersheds within a set and would be more appropriately applied at a study site that has multiple replicate pairs.
Development of a robust spatial database allows for a broader thematic characterization of the watersheds. Other variables may warrant potential inclusion within the database, whether discrete or continuous. For example, livestock grazing can have extensive impacts on watershed function through the compaction of soil and removal of ground-level herbaceous vegetation, which can increase erosion potential and impact water quality [65]. Characterizing watersheds based on their level of degradation resulting from livestock grazing, or other forms of disturbance such as wildfire effects, could affect management applications [66]. Active watershed restoration activities may be documented within a specified watershed, including the amount or type of existing erosion control structures, which can determine the scale at which restoration management is utilized [9]. Additionally, the location and amount of precipitation varies significantly from year-to-year during both the North American monsoon and winter rainfall seasons, possibly resulting in inconsistent surface water flows spatially [67,68]. Therefore, depending on the size or spatial proximity of the study watersheds, it may be appropriate to include climate variables such as precipitation and temperature. Because of the relatively small size of Smith Canyon watershed (421 ha), we assume that climate factors such as precipitation and temperature are similar throughout, though site-specific conditions may vary.
Hydrologic metrics are included as variables within many watershed regionalization studies [28,29,30]. We chose to include modeled hydrologic metrics, developed for each sub-basin in SWAT. However, our modeled metrics were developed using structural and biophysical inputs and were shown to be relatively correlated to the model input variables. Direct hydrologic measurements may provide more variability of flow dynamics between sub-basins.
Despite the complexity of the database, it is important to emphasize that each spatial variable is a representation of landscape conditions. Within the selected watershed pairings, a detailed review of on-the-ground conditions may be needed to determine the appropriateness of the identified pairing.

4.2. Structural Variables

We applied a mixed qualitative and quantitative approach using variable correlation and thematic data to reduce the dimensionality of our spatial database. This approach was effective in removing highly correlated variables and identified variables that generally characterize unique properties of the sub-basins. However, altering the variables that are used within the hierarchical clustering analysis can result in substantially different clusters. Additionally, although a semi-qualitative analysis allows for the user to select variables that may have an environmental impact or variables that are important to the user related to on-the-ground applications, it also could introduce unintentional bias into the respective application.
A more robust and less qualitative exploratory factor analysis can be applied to reduce the dimensionality of the spatial database. For example, an exploratory factor analysis (EFA) is a widely used approach to identify collections of variables that have underlying relationships within identified factors and that best represent the original data [69,70], such as maximum likelihood [71]. A maximum likelihood EFA identifies multiple factors within the dataset identifying variable combinations that best represent the full dataset. Exploratory factor analyses can be used either as a replacement of the correlation analysis or in addition to it.
Additionally, maximum likelihood EFA performs better with normally distributed (i.e., Gaussian) data—and not all structural, biophysical, and hydrologic variables were normally distributed. In this application, we assume that a non-Gaussian distribution would not greatly affect the results of the study. Methods can be applied to normalize the distribution of the data; however, this approach can substantially increase the complexity and difficulty of this analysis and consideration of this impact should be weighed. For this application, we chose to continue with the natural distribution for the selected variables.

5. Conclusions

The overarching effects and benefits of land management decisions, such as watershed restoration, are often not fully understood due to a lacking control within an experimental design. Challenges related to a lacking experimental design can be addressed through the application of a paired watershed approach, allowing for comparison between treatment and control watersheds. We developed and applied a novel statistically based hierarchical clustering analysis for watershed pairing within an experimental landscape consisting of numerous superficially structurally similar sub-basins to address this concern.
The development of a robust spatial database consisting of structural, biophysical, and hydrologic variables that can provide a holistic and spatially explicit environmental representation of each modeled sub-basin within a given watershed is necessary for this approach to be effective. Overall, we included 35 variables that addressed the watershed’s collective structural, vegetation, geographic, soil, and modeled hydrologic properties. We reduced the input variables to 12 through a mixed qualitative and quantitative statistical analysis, ultimately identifying specific variables that were less correlated, known to affect watershed function, and have widespread geographic presence within the watershed. Using the selected variables within a hierarchical clustering analysis, we produced two clusters that identified paired sub-basins—(i) a primary cluster using only the structural and biophysical traits and (ii) a secondary cluster including the hydrologic metrics. Both clusters identified three unique clades of sub-basins within the watershed and were generally effective in separating the watershed into three primary areas between the upper, middle, and lower portions. In addition, four overlapping geographic areas could be identified between the clusters, each with distinct spatial characteristics.
Overall, this research approach identified environmentally similar watersheds, which allow for the ability to quantify management effects on watershed function. Having a justification for selecting paired watersheds within an experimental design can benefit studies of land management applications and may even produce more accurate results.

Author Contributions

R.E.P. served as the primary writer, performed all analyses, and was involved in the conceptual framework. L.M.N. developed the conceptual framework, created data inputs, analyzed results, provided supervision, helped write, and served as the primary editor. K.V. assisted as a writer, an editor, was involved in the conceptual framework, and provided supervision of this analysis. R.P. was involved in the conceptual framework and oversight. C.W. provided expertise regarding the study area and implemented results. A.R. helped write, provided expertise regarding the study area, and implemented results. H.R.P. assisted with the statistical analysis and was involved in the conceptual framework. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Biophilia Foundation and supported by Borderlands Restoration Network and the U.S. Geological Survey (USGS) Land Change Science Program.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data products for this research can be found on the U.S. Geological Survey Science Base public data repository [64], including (i) a vector shapefile of the selected sub-basins and their respective spatial variable attributes, (ii) the Google Earth Engine script used to calculate the mean Normalized Difference Vegetation Index (NDVI) image, and (iii) the R software script used to complete the correlation and hierarchical clustering analyses.

Acknowledgments

Thank you to the reviewers for completing constructive reviews of the manuscript. Thank you to the Coronado National Forest for permitting access to U.S. Forest Service lands. The use of trade, product, or firm names is for descriptive purposes only and does not constitute endorsement by the U.S. Government.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Clausen, J.C.; Jokela, W.E.; Potter, F.I.; Williams, J.W. Paired Watershed Comparison of Tillage Effects on Runoff, Sediment, and Pesticide Losses. J. Environ. Qual. 1996, 25, 1000–1007. [Google Scholar] [CrossRef]
  2. Clausen, J.C.; Spooner, J. Paired Watershed Study Design; Environmental Protection Agency Office of Wetlands, Oceans, and Watersheds. U.S.A.: Washington, DC, USA, 1993. [Google Scholar]
  3. Genereux, D.P.; Jordan, M.T.; Carbonell, D. A Paired-Watershed Budget Study to Quantify Interbasin Groundwater Flow in a Lowland Rain Forest, Costa Rica. Water Resour. Res. 2005, 41. [Google Scholar] [CrossRef]
  4. King, K.W.; Smiley, P.C.; Baker, B.J.; Fausey, N.R. Validation of Paired Watersheds for Assessing Conservation Practices in the Upper Big Walnut Creek Watershed, Ohio. J. Soil Water Conserv. 2008, 63, 380–395. [Google Scholar] [CrossRef] [Green Version]
  5. Veum, K.S.; Goyne, K.W.; Motavalli, P.P.; Udawatta, R.P. Runoff and Dissolved Organic Carbon Loss from a Paired-Watershed Study of Three Adjacent Agricultural Watersheds. Agric. Ecosyst. Environ. 2009, 130, 115–122. [Google Scholar] [CrossRef]
  6. Worqlul, A.W.; Ayana, E.K.; Yen, H.; Jeong, J.; MacAlister, C.; Taylor, R.; Gerik, T.J.; Steenhuis, T.S. Evaluating Hydrologic Responses to Soil Characteristics Using SWAT Model in a Paired-Watersheds in the Upper Blue Nile Basin. Catena 2018, 163, 332–341. [Google Scholar] [CrossRef]
  7. Ssegane, H.; Amatya, D.M.; Chescheir, G.M.; Skaggs, W.R.; Tollner, E.W.; Nettles, J.E. Consistency of Hydrologic Relationships of a Paired Watershed Approach. Am. J. Clim. Chang. 2013, 2, 147–164. [Google Scholar] [CrossRef] [Green Version]
  8. Neary, D.G. Long-Term Forest Paired Catchment Studies: What Do They Tell Us That Landscape-Level Monitoring Does Not? Forests 2016, 7, 164. [Google Scholar] [CrossRef] [Green Version]
  9. Norman, L.M.; Brinkerhoff, F.; Gwilliam, E.; Guertin, D.P.; Callegary, J.; Goodrich, D.C.; Nagler, P.L.; Gray, F. Hydrologic Response of Streams Restored with Check Dams in the Chiricahua Mountains, Arizona. River Res. Appl. 2015, 32, 519–527. [Google Scholar] [CrossRef] [Green Version]
  10. Hewlett, J.D. Comments on the Catchment Experiment to Determine Vegetal Effects on Water Yield. J. Am. Water Resour. Assoc. 1971, 7, 376–381. [Google Scholar] [CrossRef]
  11. Hornbeck, J.W. The Problem of Extreme Events in Paired-Watershed Studies. Res. Note NE-175. Up. DarbyPA U.S. Dep. Agric. For. Serv. Northeast. For. Exp. Station. 1973, 175, 1–4. [Google Scholar]
  12. Wilm, H.G. How Long Should Experimental Watersheds Be Calibrated? Trans. Am. Geophys. Union 1949, 30, 272–278. [Google Scholar] [CrossRef]
  13. Zégre, N.P. Local and Downstream Effects of Contemporary Forest Harvesting on Streamflow and Sediment Yield. Ph.D. Thesis, Oregon State University, Corvallis, OR, USA, 2008. [Google Scholar]
  14. Bates, C.G. First Results in the Streamflow Experiment, Wagon Wheel Gap, Colorado. J. For. 1921, 19, 402–408. [Google Scholar]
  15. Bates, C.G.; Henry, A.J. Second Phase of Streamflow Experiment at Wagon Wheel Gap, Colo. Mon. Weather Rev. 1928, 56, 79–80. [Google Scholar] [CrossRef]
  16. Beschta, R.L.; Pyles, M.R.; Skaugset, A.E.; Surfleet, C.G. Peakflow Responses to Forest Practices in the Western Cascades of Oregon, USA. J. Hydrol. 2000, 233, 102–120. [Google Scholar] [CrossRef] [Green Version]
  17. Bosch, J.M.; Hewlett, J.D. A Review of Catchment Experiments to Determine the Effect of Vegetation Changes on Water Yield and Evapotranspiration. J. Hydrol. 1982, 55, 3–23. [Google Scholar] [CrossRef]
  18. Huang, M.; Zhang, L.; Gallichand, J. Runoff Responses to Afforestation in a Watershed of the Loess Plateau, China. Hydrol. Process. 2003, 17, 2599–2609. [Google Scholar] [CrossRef]
  19. Kincaid, D.R.; Osborn, H.B.; Gardner, J.L. Use of Unit-Source Watersheds for Hydrologic Investigations in the Semiarid Southwest. Water Resour. Res. 1966, 2, 381–392. [Google Scholar] [CrossRef]
  20. Ziemer, R.R.; Ryan, D.F. Current Status of Experimental Paired-Watershed Research in the USDA Forest Service. EOSTrans. Am. Geophys. Union 2000, 81, F380. [Google Scholar]
  21. Brandt, M.; Bergstroem, S.; Gardelin, M. Modelling the Effects of Clearcutting on Runoff-Examples from Central Sweden. Ambio Swed. 1988, 17, 307–313. [Google Scholar]
  22. Norman, L.M.; Niraula, R. Model Analysis of Check Dam Impacts on Long-Term Sediment and Water Budgets in Southeast Arizona, USA. Ecohydrol. Hydrobiol. 2016, 16, 125–137. [Google Scholar] [CrossRef] [Green Version]
  23. Zégre, N.P.; Skaugset, A.E.; Som, N.A.; McDonnell, J.J.; Ganio, L.M. In Lieu of the Paired Catchment Approach: Hydrologic Model Change Detection at the Catchment Scale. Water Resour. Res. 2010, 46, 1–20. [Google Scholar] [CrossRef] [Green Version]
  24. Yair, A.; Kossovsky, A. Climate and Surface Properties: Hydrological Response of Small Arid and Semi-Arid Watersheds. Geomorphology 2002, 42, 43–57. [Google Scholar] [CrossRef]
  25. Yetemen, O.; Istanbulluoglu, E.; Vivoni, E.R. The Implications of Geology, Soils, and Vegetation on Landscape Morphology: Inferences from Semi-Arid Basins with Complex Vegetation Patterns in Central New Mexico, USA. Geomorphology 2010, 116, 246–263. [Google Scholar] [CrossRef]
  26. Johnson, S.C. Hierarchical Clustering Schemes. Psychometrika 1967, 32, 241–254. [Google Scholar] [CrossRef] [PubMed]
  27. Rao, A.R.; Srinivas, V.V. Regionalization of Watersheds—An Approach Based on Clustering Analysis; Springer: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
  28. Tasker, G.D. Comparing Methods of Hydrologic Regionalization. Water Resour. Bull. 1982, 18, 965–970. [Google Scholar] [CrossRef]
  29. Mosley, M.P. Delimitation of New Zealand Hydrlogic Regions. J. Hydrol. 1981, 49, 173–192. [Google Scholar] [CrossRef]
  30. Chiang, S.M.; Tsay, T.K.; Nix, S.J. Hydrologic Regionalization of Watersheds. I: Methodology Development. J. Water Resour. Plan. Manag. 2002, 128, 3–11. [Google Scholar] [CrossRef] [Green Version]
  31. Zhao, Y.; Karypis, G. Hierarchical Clustering Algorithms for Document Datasets. Data Min. Knowl. Discov. 2005, 10, 141–168. [Google Scholar] [CrossRef]
  32. Omernik, J.M. Ecoregions of the Conterminous United States. Ann. Assoc. Am. Geogr. 1987, 77, 118–125. [Google Scholar] [CrossRef]
  33. Adams, D.K.; Comrie, A.C. The North American Monsoon. Bull. Am. Meteorol. Soc. 1997, 78, 2197–2213. [Google Scholar] [CrossRef] [Green Version]
  34. Villarreal, M.L.; Haire, S.L.; Iniguez, J.M.; Cortés Montaño, C.; Poitras, T.B. Distant Neighbors: Recent Wildfire Patterns of the Madrean Sky Islands of Southwestern United States and Northwestern Mexico. Fire Ecol. 2019, 15, 2. [Google Scholar] [CrossRef]
  35. Arizona State Parks Patagonia Lake State Park Annual Weather. Available online: https://azstateparks.com/patagonia-lake/explore/weather (accessed on 2 October 2021).
  36. Arnold, J.G.; Moriasi, D.N.; Gassman, P.W.; Abbaspour, K.C.; White, M.J.; Srinivasan, R.; Santhi, C.; Harmel, R.D.; van Griensven, A.; Van Liew, M.W.; et al. SWAT: Model Use, Calibration, and Validation. Am. Soc. Agric. Biol. Eng. 2012, 55, 1491–1508. [Google Scholar] [CrossRef]
  37. Arnold, J.G.; Srinivasan, R.; Muttiah, R.S.; Williams, J.R. Large Area Hydrologic Modeling and Assessment Part I: Model Development. J. Am. Water Resour. Assoc. 1998, 34, 73–89. [Google Scholar] [CrossRef]
  38. Gassman, P.W.; Reyes, M.R.; Green, C.H.; Arnold, J.G. The Soil and Water Assessment Tool: Historical Development, Applications, and Future Research Directions. Trans. ASABE 2007, 50, 1211–1250. [Google Scholar] [CrossRef] [Green Version]
  39. SWAT Model. SWAT: Soil & Water Assessment Tool Model. 2018. Available online: https://swat.tamu.edu/ (accessed on 2 October 2021).
  40. NLCD. Multi-Resolution Land Characteristics (MRLC) Consortium National Land Cover Database. 2016. Available online: https://www.mrlc.gov/ (accessed on 2 October 2021).
  41. Villarreal, M.L.; Norman, L.M.; Wallace, C.S.A.; van Riper, C.I. A Multitemporal (1979–2009) Land-Use/Land-Cover Dataset of the Binational Santa Cruz Watershed. Open-File Report 2011–1131; U.S. Geological Survey: Reston, VA, USA, 2011. [Google Scholar]
  42. USDA. Web Soil Survey. 2019. Available online: http://websoilsurvey.nrcs.usda.gov/ (accessed on 2 October 2021).
  43. SWAT Output. SWAT Output Data: Primary Output Files. 2012. Available online: https://swat.tamu.edu/media/69395/ch32_output.pdf (accessed on 2 October 2021).
  44. Engelhardt, B.M.; Weisberg, P.J.; Chambers, J.C. Influences of Watershed Geomorphology on Extent and Composition of Riparian Vegetation. J. Veg. Sci. 2012, 23, 127–139. [Google Scholar] [CrossRef]
  45. Orellana, F.; Verma, P.; Loheide, S.P., II; Daly, E. Monitoring and Modeling Water-Vegetation Interactions in Groundwater-Dependent Ecosystems. Rev. Geophys. 2012, 50, 1–24. [Google Scholar] [CrossRef]
  46. Scott, R.L.; Cable, W.L.; Huxman, T.E.; Nagler, P.L.; Hernandez, M.; Goodrich, D.C. Multiyear Riparian Evapotranspiration and Groundwater Use for a Semiarid Watershed. J. Arid Environ. 2008, 72, 1232–1246. [Google Scholar] [CrossRef]
  47. SWAT Landuse. SWAT: Soil & Water Assessment Tool Landuse. 2018. Available online: https://oldgeni.isnew.info/landuse.html#_SWAT_landuse_classification (accessed on 2 October 2021).
  48. Wallace, C.S.A.; Villarreal, M.L.; Norman, L.M. Development of a High-Resolution Binational Vegetation Map of the Santa Cruz River Riparian Corridor and Surrounding Watershed, Southern Arizona and Northern Sonora, Mexico; Open-File Report 2011–1143; U.S. Geological Survey: Reston, VA, USA, 2011; pp. 1–22.
  49. Rawls, W.J.; Brakensiek, C.L.; Saxton, K.E. Estimation of Soil Water Properties. Trans. -Am. Soc. Agric. Eng. 1982, 25. [Google Scholar] [CrossRef]
  50. California Soil Resource Lab. Soil Data Explorer. 2011. Available online: https://casoilresource.lawr.ucdavis.edu/sde/ (accessed on 2 October 2021).
  51. Jones, K.B.; Edmonds, C.E.; Slonecker, E.T.; Wickham, J.D.; Neale, A.C.; Wade, T.G.; Riitters, K.H.; Kepner, W.G. Detecting Changes in Riparian Habitat Conditions Based on Patterns of Greenness Change: A Case Study from the Upper San Pedro River Basin, USA. Ecol. Indic. 2008, 8, 89–99. [Google Scholar] [CrossRef]
  52. Norman, L.M.; Villarreal, M.; Pulliam, H.R.; Minckley, R.; Gass, L.; Tolle, C.; Coe, M. Remote Sensing Analysis of Riparian Vegetation Response to Desert Marsh Restoration in the Mexican Highlands. Ecol. Eng. 2014, 70, 241–254. [Google Scholar] [CrossRef] [Green Version]
  53. Wilson, N.R.; Norman, L.M. Analysis of Vegetation Recovery Surrounding a Restored Wetland Using the Normalized Difference Infrared Index (NDII) and Normalized Difference Vegetation Index (NDVI). Int. J. Remote Sens. 2018, 39, 3243–3274. [Google Scholar] [CrossRef] [Green Version]
  54. Wilson, N.R.; Norman, L.M.; Villarreal, M.; Gass, L.; Tiller, R.; Salywon, A. Comparison of Remote Sensing Indices for Monitoring of Desert Cienegas. Arid Land Res. Manag. 2016, 30, 460–478. [Google Scholar] [CrossRef] [Green Version]
  55. Tucker, C.J. Red and Photographic Infrared Linear Combinations for Monitoring Vegetation. Remote Sens. Environ. 1979, 8, 127–150. [Google Scholar] [CrossRef] [Green Version]
  56. Gorelick, N.; Hancher, M.; Dixon, M.; Ilyushchenko, S.; Thau, D.; Moore, R. Google Earth Engine: Planetary-Scale Geospatial Analysis for Everyone. Remote Sens. Environ. 2017, 202, 18–27. [Google Scholar] [CrossRef]
  57. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2020. [Google Scholar]
  58. Kassambara, A.; Mundt, F. R Package Factoextra. 2020. Available online: https://cran.r-project.org/web/packages/factoextra/index.html (accessed on 2 October 2021).
  59. Wickham, H.; Chang, W.; Henry, L.; Pedersen, T.L.; Takahashi, K.; Wilke, C.; Woo, K.; Yotani, H.; Dunnington, D. R Package ggplot2. 2020. Available online: https://cran.r-project.org/web/packages/ggplot2/ggplot2.pdf (accessed on 2 October 2021).
  60. Zambelli, A.E. A Data-Driven Approach to Estimating the Number of Clusters in Hierarchical Clustering. F1000Research 2016, 5, 1–14. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  61. Rousseeuw, P.J. Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis. J. Comput. Appl. Math. 1987, 20, 53–65. [Google Scholar] [CrossRef] [Green Version]
  62. Banerjee, A.; Dave, R.N. Validating Clusters Using the Hopkins Statistic. IEEE Int. Conf. Fuzzy Syst. 2004. [Google Scholar] [CrossRef]
  63. Murtagh, F.; Contreras, P. Methods of Hierarchical Clustering Computer Science. Mathematics 2011, 1–21. Available online: https://arxiv.org/abs/1105.0121 (accessed on 2 October 2021).
  64. Petrakis, R.E.; Norman, L.M.; Vaughn, K.; Pritzlaff, R.; Weaver, C.; Audrey, R.H.; Ronald, P. Watershed Pairing of Sub-Basins within Smith Canyon Watershed Using a Hierarchical Clustering Approach: U.S. Geological Survey Data Release; U.S. Geological Survey: Reston, VA, USA, 2021. [CrossRef]
  65. Fleischner, T.L. Ecological Costs of Livestock Grazing in Western North America. JSTOR 1994, 8, 629–644. [Google Scholar] [CrossRef] [Green Version]
  66. Fesenmyer, K.A.; Dauwalter, D.C.; Evans, C.; Allai, T. Livestock Management, Beaver, and Climate Influences on Riparian Vegetation in a Semi- Arid Landscape. PLoS ONE 2018, 13, e0208928. [Google Scholar] [CrossRef]
  67. Gebremichael, M.; Vivoni, E.R.; Watts, C.J.; Rodriguez, J.C. Submesoscale Spatiotemporal Variability of North American Monsoon Rainfall over Complex Terrain. Am. Meteorol. Soc. 2007, 20, 1751–1773. [Google Scholar] [CrossRef] [Green Version]
  68. Griffin, D.; Woodhouse, C.A.; Meko, D.M.; Stahle, D.W.; Faulstich, H.L.; Carrillo, C.; Touchan, R.; Castro, C.L.; Leavitt, S.W. North American Monsoon Precipitation Reconstructed from Tree-Ring Latewood. Geophys. Res. Lett. 2013, 40, 954–958. [Google Scholar] [CrossRef]
  69. Auerswald, M.; Moshagen, M. How to Determine the Number of Factors to Retain in Exploratory Factor Analysis: A Comparison of Extraction Methods under Realistic Conditions. Am. Psychol. Assoc. 2019, 24, 468–491. [Google Scholar] [CrossRef] [PubMed]
  70. Gorsuch, R.L. Exploratory Factor Analysis: Its Role in Item Analysis. J. Personal. Assess. 1997, 3891. [Google Scholar] [CrossRef]
  71. Yong, A.G.; Pearce, S. A Beginner’s Guide to Factor Analysis: Focusing on Exploratory Factor Analysis. Tutor. Quant. Methods Psychol. 2013, 9, 79–94. [Google Scholar] [CrossRef]
Figure 1. Locator map showing the location of Smith Canyon divided into the 225 total sub-basins (black). A total of 91 were selected for analysis (light gray). Much of Smith Canyon is located within the boundary of the Coronado National Forest.
Figure 1. Locator map showing the location of Smith Canyon divided into the 225 total sub-basins (black). A total of 91 were selected for analysis (light gray). Much of Smith Canyon is located within the boundary of the Coronado National Forest.
Water 13 02955 g001
Figure 2. Correlations between the full list of structural variables from −1 (blue) to 1 (red). Non-significant correlations (p-value > 0.05) were removed.
Figure 2. Correlations between the full list of structural variables from −1 (blue) to 1 (red). Non-significant correlations (p-value > 0.05) were removed.
Water 13 02955 g002
Figure 3. Correlations between the full list of biophysical variables from −1 (blue) to 1 (red). Non-significant correlations (p-value > 0.05) were removed.
Figure 3. Correlations between the full list of biophysical variables from −1 (blue) to 1 (red). Non-significant correlations (p-value > 0.05) were removed.
Water 13 02955 g003
Figure 4. Correlations between the full list of modeled hydrologic metrics from the soil a water assessment tool (SWAT) ranging from −1 (blue) to 1 (red). Non-significant correlations (p-value > 0.05) were removed.
Figure 4. Correlations between the full list of modeled hydrologic metrics from the soil a water assessment tool (SWAT) ranging from −1 (blue) to 1 (red). Non-significant correlations (p-value > 0.05) were removed.
Water 13 02955 g004
Figure 5. Correlations from −1 (blue) to 1 (red) for (a) the full list of combined variables and (b) the final selected variables for clustering. Non-significant correlations (p-value > 0.05) were removed. For both sets of variable correlations, the hydrologic metrics were separated using black lines.
Figure 5. Correlations from −1 (blue) to 1 (red) for (a) the full list of combined variables and (b) the final selected variables for clustering. Non-significant correlations (p-value > 0.05) were removed. For both sets of variable correlations, the hydrologic metrics were separated using black lines.
Water 13 02955 g005
Figure 6. Watershed maps depicting the geographic distribution of values for each of the twelve selected spatial, biophysical, and hydrologic spatial variables (al) across each of the 91 selected sub-basins.
Figure 6. Watershed maps depicting the geographic distribution of values for each of the twelve selected spatial, biophysical, and hydrologic spatial variables (al) across each of the 91 selected sub-basins.
Water 13 02955 g006
Figure 7. The two cluster dendrograms showing the division of the three clades and their respective sub-basins, for (a) the primary cluster developed using only the structural and biophysical variables, and (b) the secondary cluster developed using a combination of the selected structural and biophysical variables and selected hydrologic metrics.
Figure 7. The two cluster dendrograms showing the division of the three clades and their respective sub-basins, for (a) the primary cluster developed using only the structural and biophysical variables, and (b) the secondary cluster developed using a combination of the selected structural and biophysical variables and selected hydrologic metrics.
Water 13 02955 g007
Figure 8. Maps showing the geographic distribution of the three clades derived in the hierarchical clustering analysis for (a) the primary cluster, and (b) the secondary cluster, as well as (c) areas of overlapping clusters. Subsets (i and ii) show selected paired watersheds within specific clades. An aerial orthophoto from the National Agricultural Imagery Program (NAIP) was used as background imagery.
Figure 8. Maps showing the geographic distribution of the three clades derived in the hierarchical clustering analysis for (a) the primary cluster, and (b) the secondary cluster, as well as (c) areas of overlapping clusters. Subsets (i and ii) show selected paired watersheds within specific clades. An aerial orthophoto from the National Agricultural Imagery Program (NAIP) was used as background imagery.
Water 13 02955 g008
Table 1. Listing the modeled hydrologic metrics (n = 7) for each sub-basin derived using the Soil & Water Assessment Tool (SWAT). Detailed descriptions of the hydrologic metrics are provided by the SWAT input/output file documentation [43] and listed here in italics.
Table 1. Listing the modeled hydrologic metrics (n = 7) for each sub-basin derived using the Soil & Water Assessment Tool (SWAT). Detailed descriptions of the hydrologic metrics are provided by the SWAT input/output file documentation [43] and listed here in italics.
Hydrologic Metric (Unit)Short NameDescription
Potential Evapotranspiration (mm)Potential ETPotential evapotranspiration from the subbasin during the time step
Evapotranspiration (mm)ETActual evapotranspiration from the subbasin during the time step
Soil Water Content (mm)Soil Water Cont.Amount of water in the soil profile at the end of the time step
Water Percolation (mm)PercolationWater that percolates past the root zone during the time step
Surface Runoff (mm)Surf. RunoffSurface runoff contribution to streamflow during the time step
Water Yield (mm)Water YieldNet amount of water that leaves the subbasin and contributes to streamflow in the reach during the time step
Lateral Flow (mm)Lat. FlowAverage annual lateral flow in subbasin during the time step
Table 2. Listing the full suite of (a) structural (n = 16) and (b) biophysical (n = 12) variables that were collected for each sub-basin. This list includes the unit, the short name used in the database, and the source of the variable. Per. = “percent.” Source abbreviations follow: Digital Elevation Model (DEM), Soil & Water Assessment Tool (SWAT), Soil Survey Geographic (SSURGO), and Normalized Difference Vegetation Index (NDVI).
Table 2. Listing the full suite of (a) structural (n = 16) and (b) biophysical (n = 12) variables that were collected for each sub-basin. This list includes the unit, the short name used in the database, and the source of the variable. Per. = “percent.” Source abbreviations follow: Digital Elevation Model (DEM), Soil & Water Assessment Tool (SWAT), Soil Survey Geographic (SSURGO), and Normalized Difference Vegetation Index (NDVI).
(a) Structural Variables
Variable (Unit)Short NamePrimary SourceSource Spatial Resolution
Mean Elevation (ft)Mean ElevationDEMSub-3 m
Minimum Elevation (ft)Min. ElevationDEMSub-3 m
Maximum Elevation (ft)Max. ElevationDEMSub-3 m
Percentage Low Slope (%)Per. LowSWAT derivative; DEMSub-3 m
Percentage Moderate Slope (%)Per. ModerateSWAT derivative; DEMSub-3 m
Percentage Steep Slope (%)Per. SteepSWAT derivative; DEMSub-3 m
Sub-basin Slope (%)Sub. SlopeSWAT derivative; DEMSub-3 m
Width (m)WidthSWAT derivative; DEMSub-3 m
Length (m)LengthSWAT derivative; DEMSub-3 m
Area (ha)AreaSWAT derivative; DEMSub-3 m
Perimeter (m)PerimeterSWAT derivative; DEMSub-3 m
Reach Length (m)Reach LengthSWAT derivative; DEMSub-3 m
Percent North Facing Aspect (%)Per. NorthDEMSub-3 m
Percent East Facing Aspect (%)Per. EastDEMSub-3 m
Percent South Facing Aspect (%)Per. SouthDEMSub-3 m
Percent West Facing Aspect (%)Per. WestDEMSub-3 m
(b) Biophysical Variables
Variable (Unit)Short NamePrimary SourceSource Spatial Resolution
Percent Range—Brush (%)Per. RNGBSWAT derivative; Villarreal et al. (2011)30 m
Percent Range—Grasses (%)Per. RNGESWAT derivative; Villarreal et al. (2011)30 m
Percent Madrean Encinal—Class 45 (%)Per. 45vWallace et al. (2011)30 m
Percent Apacherian-Chihuahuan Mesquite Upland Scrub—Class 52 (%)Per. 52vWallace et al. (2011)30 m
Percent Chihuahuan Creosotebush, Mixed Desert and Thorn Scrub—Class 56 (%)Per. 56vWallace et al. (2011)30 m
Percent Apacherian-Chihuahuan Piedmont Semi-Desert Grassland—Class 65 (%)Per. 65vWallace et al. (2011)30 m
Percent Caralampi Gravelly Sandy Loam—10 to 40% Slopes (%)Per. 1421630SWAT derivative; SSURGOHigh-Resolution Vector
Percent Caralampi Gravelly Sandy Loam—0 to 60% Slopes, Eroded (%)Per. 1421631SWAT derivative; SSURGOHigh-Resolution Vector
Minimum NDVINDVI Min.Mean NDVI Image; Landsat30 m
Maximum NDVINDVI Max.Mean NDVI Image; Landsat30 m
Mean NDVINDVI MeanMean NDVI Image; Landsat30 m
NDVI Standard DeviationNDVI Std. Dev.Mean NDVI Image; Landsat30 m
Table 3. Showing correlations between the modeled hydrologic metrics derived using the Soil & Water Assessment Tool (SWAT) and variables that are representative of the (a) structural and (b) biophysical layers used as inputs in SWAT. Significant correlations (p-value < 0.05) are identified with “*”.
Table 3. Showing correlations between the modeled hydrologic metrics derived using the Soil & Water Assessment Tool (SWAT) and variables that are representative of the (a) structural and (b) biophysical layers used as inputs in SWAT. Significant correlations (p-value < 0.05) are identified with “*”.
(a) Structural Variables
Hydrologic
Metric
Per.
Low
Per.
Moderate
Per.
Steep
Min.
Elevation
Max.
Elevation
Mean
Elevation
Potential ET0.69 *−0.34 *−0.41 *−0.99 *−0.99 *−0.99 *
ET0.93 *−0.15−0.76 *−0.53 *−0.59 *−0.59 *
Soil Water Cont.0.24 *−0.19−0.09−0.42 *−0.44 *−0.42 *
Percolation0.77 *−0.24 *−0.55 *−0.34 *−0.39 *−0.40 *
Surf. Runoff0.34 *−0.20−0.18−0.12−0.11−0.14
Water Yield−0.93 *0.160.75 *0.50 *0.57 *0.57 *
Lat. Flow−0.92 *0.170.73 *0.49 *0.550.56 *
(b) Biophysical Variables
Hydrologic
Metric
Per.
RNGB
Per.
RNGE
Per.
1421630
Min.
1421631
Potential ET0.23 *−0.05−0.78 *0.67 *
ET0.12−0.08−0.35 *0.08
Soil Water Cont.0.58 *−0.41 *−0.32 *0.42 *
Percolation0.040.03−0.28 *−0.06
Surf. Runoff−0.70 *0.80 *−0.11−0.08
Water Yield−0.140.100.34 *−0.05
Lat. Flow−0.070.030.33 *−0.04
Table 4. Showing mean values of the selected, (a) structural, (b) biophysical, and (c) hydrologic variables for each of the geographic cluster pairings. Highest values are bolded while the lowest values are italicized.
Table 4. Showing mean values of the selected, (a) structural, (b) biophysical, and (c) hydrologic variables for each of the geographic cluster pairings. Highest values are bolded while the lowest values are italicized.
(a) Structural Variables
Geographic
Pairing
Per. Steep
(%)
Sub. Slope
(%)
Per.
North (%)
Area
(ha)
Length
(m)
Northern46.8141.922.22.3294.1
Southern21.190.914.12.3371.5
East-Central48.6135.013.15.2570.5
West-Central58.8150.433.61.7283.2
(b) Biophysical Variables
Geographic
Pairing
Per. RNGB
(%)
Per. 1,421,630
(%)
Per. 1,421,631
(%)
NDVI
Min.
NDVI
Std. Dev.
Northern75.1100.00.00.1860.033
Southern90.90.075.30.2100.034
East-Central77.50.296.10.1820.036
West-Central87.73.892.70.1890.034
(c) Hydrologic Variables
Geographic
Pairing
Surf. Runoff
(mm)
Water Yield
(mm)
Northern0.758.8
Southern1.141.2
East-Central0.957.2
West-Central0.658.7
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Petrakis, R.E.; Norman, L.M.; Vaughn, K.; Pritzlaff, R.; Weaver, C.; Rader, A.; Pulliam, H.R. Hierarchical Clustering for Paired Watershed Experiments: Case Study in Southeastern Arizona, U.S.A. Water 2021, 13, 2955. https://doi.org/10.3390/w13212955

AMA Style

Petrakis RE, Norman LM, Vaughn K, Pritzlaff R, Weaver C, Rader A, Pulliam HR. Hierarchical Clustering for Paired Watershed Experiments: Case Study in Southeastern Arizona, U.S.A. Water. 2021; 13(21):2955. https://doi.org/10.3390/w13212955

Chicago/Turabian Style

Petrakis, Roy E., Laura M. Norman, Kurt Vaughn, Richard Pritzlaff, Caleb Weaver, Audrey Rader, and H. Ronald Pulliam. 2021. "Hierarchical Clustering for Paired Watershed Experiments: Case Study in Southeastern Arizona, U.S.A." Water 13, no. 21: 2955. https://doi.org/10.3390/w13212955

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop