EGU General Assembly 2026 · Vienna & Online · 3–8 May 2026

Explainable Machine Learning for Spatio-Temporal Groundwater Vulnerability

A Random Forest–DRASTIC–Transit Time framework for Western Thessaly, Greece. The study integrates classical hydrogeological vulnerability mapping with nonlinear machine learning, spatial validation, class balancing, explainable AI, and uncertainty-aware decision-support maps.

Top model RF B DRASTIC + Transit Time
Accuracy 0.8214 Best overall score
F1-score 0.8788 After threshold adjustment
Study area 6,090 km² Western Thessaly basin

Why Transit Time matters.

Travel time, residence time, and groundwater age describe how long water or contaminants spend moving through the saturated zone. This timing strongly affects degradation, attenuation, persistence, and risk.

⏱️

Residence and travel time

Travel time can be interpreted as the advective time required for a parcel of groundwater to move from recharge to a sampling or discharge point. It depends directly on groundwater velocity and flow-path geometry.

🧪

Contaminant transformation

Reactive pollutants may degrade during groundwater residence. Longer residence times can promote attenuation, while faster pathways may allow pollutants to reach receptors before sufficient degradation occurs.

🧱

Heterogeneous aquifers

In interbedded sands, gravels, and clays, contaminants may move rapidly through permeable layers while diffusing into lower-permeability materials, creating storage, delayed release, and extended plume persistence.

Hydrogeological interpretation

Transit Time is introduced as an eighth vulnerability factor because it represents the delay between surface contamination and aquifer impact. Shorter travel times may indicate faster contaminant arrival and weaker opportunity for attenuation. Longer travel times may reduce immediate risk for reactive pollutants, although persistent or non-reactive contaminants can remain problematic for long periods.

Fast pathway = rapid contaminant arrival Slow pathway = longer attenuation window Diffusive storage = plume persistence Isochrons = equal groundwater age contours

The DRASTIC model.

DRASTIC is a standardized groundwater vulnerability index based on seven hydrogeological parameters, each represented by a rating and a weight.

DRASTIC index equation

DI = (Dr × Dw) + (Rr × Rw) + (Ar × Aw) + (Sr × Sw) +
     (Tr × Tw) + (Ir × Iw) + (Cr × Cw)

Ratings usually range from 1 to 10 and weights from 1 to 5. Higher DRASTIC index values indicate greater intrinsic vulnerability to groundwater contamination.

Where:

  • D = Depth to groundwater
  • R = Net recharge
  • A = Aquifer media
  • S = Soil media
  • T = Topography
  • I = Impact of the vadose zone
  • C = Hydraulic conductivity

Parameter weights

DRASTIC parameter Weight
Depth to Aquifer5
Recharge4
Aquifer Media3
Soil Media2
Topography1
Impact of the Vadose Zone Media5
Hydraulic Conductivity3
💧

Depth to groundwater

Shallow groundwater generally increases vulnerability because contaminants have a shorter distance to travel before reaching the aquifer.

🌧️

Recharge

Higher recharge can increase contaminant transport by carrying pollutants downward through soil and vadose-zone materials.

🪨

Aquifer and vadose media

Permeable media such as sand, gravel, or karstic formations can increase vulnerability, while clay-rich materials may provide stronger attenuation.

Developed methodological approach.

The workflow combines GIS thematic layers, nitrate monitoring data, DRASTIC vulnerability modelling, Random Forest classification, validation metrics, explainability, and uncertainty maps.

Stage A · Data preparation

A1. Data collection

Geological maps, DEM data, road and settlement data, land-use data, aquifer measurements, pollution sources, and nitrate monitoring wells are compiled for Western Thessaly.

A2. Thematic layer development

GIS raster layers are generated for DRASTIC variables, including depth, recharge, aquifer media, soil, topography, vadose zone influence, hydraulic conductivity, and Transit Time.

A3. Training and testing datasets

Raster values are extracted at monitoring well locations, exported to CSV, and prepared for Python-based machine learning.

Stage B · Model application and validation

B1. Random Forest modelling

Random Forest learns nonlinear relationships between vulnerability factors and nitrate-based contamination classes.

B2. Vulnerability map creation

Outputs are classified into five categories using Natural Breaks: Very Low, Low, Moderate, High, and Very High vulnerability.

B3. Validation

Accuracy, precision, recall, F1-score, ROC-AUC, cross-validation, confusion matrices, and feature importance are used to evaluate performance.

Vulnerability classification

Very Low
Low
Moderate
High
Very High

Wells with nitrate concentrations above 50 mg/L are classified as contaminated, while wells below the threshold are classified as non-contaminated. This supports supervised ML training and validation against observed groundwater-quality data.

Western Thessaly study area.

The basin covers approximately 6,090 km² across Trikala, Karditsa, Larisa, Magnesia, and Fthiotida, with mountainous western terrain and a lowland eastern plain shaped by Alpine and post-Alpine geological evolution.

⛰️

Morphology

The western part is dominated by the Southern Pindos mountain complex, while the eastern part corresponds to the lowland Thessalian Plain.

🧬

Geology

The area includes pre-Alpine, Alpine, and post-Alpine formations, with crystalline rocks, ophiolites, flysch, limestones, molassic formations, and Neogene–Quaternary deposits.

🌾

Land use

Permanently irrigated land dominates the landscape, reflecting intensive agricultural pressure and the importance of nitrate-related vulnerability mapping.

Model comparison and results.

Four configurations are compared: baseline DRASTIC, DRASTIC with Transit Time, Random Forest using seven DRASTIC layers, and Random Forest using seven layers plus Transit Time.

DRASTIC models

DRASTIC Model A

Uses the standard seven DRASTIC parameters. The Pearson correlation with log-transformed nitrate is weak, with r = 0.105 and p = 0.188, indicating no statistically significant relationship.

DRASTIC Model B

Incorporates Transit Time as an auxiliary parameter. The nitrate correlation improves to r = 0.261, but remains statistically non-significant with p = 0.164. Spatially, the map becomes more refined and highlights expanded high-risk zones.

Random Forest models

Random Forest A

Uses the traditional seven DRASTIC variables. It achieves accuracy = 0.8036 and an adjusted F1-score = 0.8657. Depth to groundwater, vadose zone impact, and recharge are the strongest predictors.

Random Forest B

Adds Transit Time to the Random Forest feature set. It achieves the best performance with accuracy = 0.8214 and adjusted F1-score = 0.8788, offering improved spatial precision and stronger identification of high and very-high vulnerability zones.

Performance summary

Model Input variables Key result Interpretation
DRASTIC A 7 DRASTIC factors r = 0.105 Baseline map with limited nitrate correlation
DRASTIC B 7 DRASTIC factors + Transit Time r = 0.261 More refined spatial vulnerability pattern
RF A 7 DRASTIC factors Accuracy = 0.8036; F1 = 0.8657 Nonlinear ML improves prediction and granularity
RF B 7 DRASTIC factors + Transit Time Accuracy = 0.8214; F1 = 0.8788 Best overall model with temporal contamination sensitivity
🏆

Best model

Random Forest B is preferred because it achieves the highest accuracy, strongest F1-score, better spatial granularity, and improved temporal sensitivity.

🔍

Top controls

Depth to groundwater, vadose zone impact, and recharge consistently emerge as the dominant drivers of groundwater vulnerability.

🧠

Explainability

Feature importance, SHAP values, confidence maps, and entropy maps make the model transparent and useful for decision-making under uncertainty.

Poster: ptsag1.jpg

The poster is displayed as a JPG image for full compatibility with iPhone, iPad, Android, and desktop browsers.

Open JPG Download JPG Open PDF
EGU26-21505 Poster

Citation

Copy the formatted citation for reports, slides, project pages, or conference material.

Tsangaratos, P., Matiatos, I., Ilia, I., and Markantonis, K.: Explainable Machine Learning for Spatio-Temporal Groundwater Vulnerability Mapping: A Random Forest-DRASTIC-Transit Time Framework for Western Thessaly, Greece, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-21505, https://doi.org/10.5194/egusphere-egu26-21505, 2026.

Citation copied ✔️