Explainable Machine Learning for Spatio-Temporal Groundwater Vulnerability
A Random Forest–DRASTIC–Transit Time framework for Western Thessaly, Greece. The study integrates classical hydrogeological vulnerability mapping with nonlinear machine learning, spatial validation, class balancing, explainable AI, and uncertainty-aware decision-support maps.
Why Transit Time matters.
Travel time, residence time, and groundwater age describe how long water or contaminants spend moving through the saturated zone. This timing strongly affects degradation, attenuation, persistence, and risk.
Residence and travel time
Travel time can be interpreted as the advective time required for a parcel of groundwater to move from recharge to a sampling or discharge point. It depends directly on groundwater velocity and flow-path geometry.
Contaminant transformation
Reactive pollutants may degrade during groundwater residence. Longer residence times can promote attenuation, while faster pathways may allow pollutants to reach receptors before sufficient degradation occurs.
Heterogeneous aquifers
In interbedded sands, gravels, and clays, contaminants may move rapidly through permeable layers while diffusing into lower-permeability materials, creating storage, delayed release, and extended plume persistence.
Hydrogeological interpretation
Transit Time is introduced as an eighth vulnerability factor because it represents the delay between surface contamination and aquifer impact. Shorter travel times may indicate faster contaminant arrival and weaker opportunity for attenuation. Longer travel times may reduce immediate risk for reactive pollutants, although persistent or non-reactive contaminants can remain problematic for long periods.
The DRASTIC model.
DRASTIC is a standardized groundwater vulnerability index based on seven hydrogeological parameters, each represented by a rating and a weight.
DRASTIC index equation
(Tr × Tw) + (Ir × Iw) + (Cr × Cw)
Ratings usually range from 1 to 10 and weights from 1 to 5. Higher DRASTIC index values indicate greater intrinsic vulnerability to groundwater contamination.
Where:
- D = Depth to groundwater
- R = Net recharge
- A = Aquifer media
- S = Soil media
- T = Topography
- I = Impact of the vadose zone
- C = Hydraulic conductivity
Parameter weights
| DRASTIC parameter | Weight |
|---|---|
| Depth to Aquifer | 5 |
| Recharge | 4 |
| Aquifer Media | 3 |
| Soil Media | 2 |
| Topography | 1 |
| Impact of the Vadose Zone Media | 5 |
| Hydraulic Conductivity | 3 |
Depth to groundwater
Shallow groundwater generally increases vulnerability because contaminants have a shorter distance to travel before reaching the aquifer.
Recharge
Higher recharge can increase contaminant transport by carrying pollutants downward through soil and vadose-zone materials.
Aquifer and vadose media
Permeable media such as sand, gravel, or karstic formations can increase vulnerability, while clay-rich materials may provide stronger attenuation.
Developed methodological approach.
The workflow combines GIS thematic layers, nitrate monitoring data, DRASTIC vulnerability modelling, Random Forest classification, validation metrics, explainability, and uncertainty maps.
Stage A · Data preparation
Geological maps, DEM data, road and settlement data, land-use data, aquifer measurements, pollution sources, and nitrate monitoring wells are compiled for Western Thessaly.
GIS raster layers are generated for DRASTIC variables, including depth, recharge, aquifer media, soil, topography, vadose zone influence, hydraulic conductivity, and Transit Time.
Raster values are extracted at monitoring well locations, exported to CSV, and prepared for Python-based machine learning.
Stage B · Model application and validation
Random Forest learns nonlinear relationships between vulnerability factors and nitrate-based contamination classes.
Outputs are classified into five categories using Natural Breaks: Very Low, Low, Moderate, High, and Very High vulnerability.
Accuracy, precision, recall, F1-score, ROC-AUC, cross-validation, confusion matrices, and feature importance are used to evaluate performance.
Vulnerability classification
Wells with nitrate concentrations above 50 mg/L are classified as contaminated, while wells below the threshold are classified as non-contaminated. This supports supervised ML training and validation against observed groundwater-quality data.
Western Thessaly study area.
The basin covers approximately 6,090 km² across Trikala, Karditsa, Larisa, Magnesia, and Fthiotida, with mountainous western terrain and a lowland eastern plain shaped by Alpine and post-Alpine geological evolution.
Morphology
The western part is dominated by the Southern Pindos mountain complex, while the eastern part corresponds to the lowland Thessalian Plain.
Geology
The area includes pre-Alpine, Alpine, and post-Alpine formations, with crystalline rocks, ophiolites, flysch, limestones, molassic formations, and Neogene–Quaternary deposits.
Land use
Permanently irrigated land dominates the landscape, reflecting intensive agricultural pressure and the importance of nitrate-related vulnerability mapping.
Model comparison and results.
Four configurations are compared: baseline DRASTIC, DRASTIC with Transit Time, Random Forest using seven DRASTIC layers, and Random Forest using seven layers plus Transit Time.
DRASTIC models
DRASTIC Model A
Uses the standard seven DRASTIC parameters. The Pearson correlation with log-transformed nitrate is weak, with r = 0.105 and p = 0.188, indicating no statistically significant relationship.
DRASTIC Model B
Incorporates Transit Time as an auxiliary parameter. The nitrate correlation improves to r = 0.261, but remains statistically non-significant with p = 0.164. Spatially, the map becomes more refined and highlights expanded high-risk zones.
Random Forest models
Random Forest A
Uses the traditional seven DRASTIC variables. It achieves accuracy = 0.8036 and an adjusted F1-score = 0.8657. Depth to groundwater, vadose zone impact, and recharge are the strongest predictors.
Random Forest B
Adds Transit Time to the Random Forest feature set. It achieves the best performance with accuracy = 0.8214 and adjusted F1-score = 0.8788, offering improved spatial precision and stronger identification of high and very-high vulnerability zones.
Performance summary
| Model | Input variables | Key result | Interpretation |
|---|---|---|---|
| DRASTIC A | 7 DRASTIC factors | r = 0.105 | Baseline map with limited nitrate correlation |
| DRASTIC B | 7 DRASTIC factors + Transit Time | r = 0.261 | More refined spatial vulnerability pattern |
| RF A | 7 DRASTIC factors | Accuracy = 0.8036; F1 = 0.8657 | Nonlinear ML improves prediction and granularity |
| RF B | 7 DRASTIC factors + Transit Time | Accuracy = 0.8214; F1 = 0.8788 | Best overall model with temporal contamination sensitivity |
Best model
Random Forest B is preferred because it achieves the highest accuracy, strongest F1-score, better spatial granularity, and improved temporal sensitivity.
Top controls
Depth to groundwater, vadose zone impact, and recharge consistently emerge as the dominant drivers of groundwater vulnerability.
Explainability
Feature importance, SHAP values, confidence maps, and entropy maps make the model transparent and useful for decision-making under uncertainty.
Citation
Copy the formatted citation for reports, slides, project pages, or conference material.
Tsangaratos, P., Matiatos, I., Ilia, I., and Markantonis, K.: Explainable Machine Learning for Spatio-Temporal Groundwater Vulnerability Mapping: A Random Forest-DRASTIC-Transit Time Framework for Western Thessaly, Greece, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-21505, https://doi.org/10.5194/egusphere-egu26-21505, 2026.