Inverse method using boosted regression tree and k-nearest neighbor to quantify effects of point and non-point source nitrate pollution in groundwater

Research output: Contribution to journalArticle

Standard

Harvard

APA

CBE

MLA

Vancouver

Author

RIS

TY - JOUR

T1 - Inverse method using boosted regression tree and k-nearest neighbor to quantify effects of point and non-point source nitrate pollution in groundwater

AU - Motevalli, Alireza

AU - Naghibi, Seyed Amir

AU - Hashemi, Hossein

AU - Berndtsson, Ronny

AU - Pradhan, Biswajeet

AU - Gholami, Vahid

PY - 2019

Y1 - 2019

N2 - Nitrate pollution of groundwater has increased dramatically worldwide due to increase of population and agricultural productivity. The resulting nitrate concentration in groundwater is usually a combination of various types of point and non-point pollutant sources. It is often difficult to distinguish between these sources since groundwater is formed in large and complex catchments with various natural processes and anthropogenic influence that contribute to a certain downstream nitrate concentration. For such conditions, this paper uses a methodology that can be used to inversely determine type and location of main nitrate pollutant source. The methodology builds on two state-of-the-art data mining techniques, boosted regression tree (BRT)and k-nearest neighbor (KNN). These techniques are used to produce a nitrate pollution vulnerability map. The methodology can mitigate effects of subjective judgement on determining importance of different sources and mechanisms for nitrate transport. The investigated mechanisms are hydrogeological, hydrological, anthropogenic, topography, and soil conditioning factors. Thus, the proposed methodology is used to separate between natural processes and anthropogenic effects on nitrate pollution. To calculate the groundwater vulnerability maps, a groundwater nitrate concentration of 40 mg/L (suggested by WHO with a 20% risk margin)was selected as a general threshold for identifying polluted areas that resulted in 96 polluted wells. Non-polluted locations were selected from well data with nitrate concentration less than 15 mg/L (96 non-polluted). The models were trained on 70% polluted and 70% non-polluted site data. The remaining data, 30% polluted and 30% non-polluted sites, were used to validate the simulation results. Results showed that the BRT produced outputs with higher performance than the KNN algorithm. The final ranking results based on the BRT model showed the higher importance of hydraulic conductivity, river density, soil, slope percent, net recharge, and distance from villages, in order, relative to other factors.

AB - Nitrate pollution of groundwater has increased dramatically worldwide due to increase of population and agricultural productivity. The resulting nitrate concentration in groundwater is usually a combination of various types of point and non-point pollutant sources. It is often difficult to distinguish between these sources since groundwater is formed in large and complex catchments with various natural processes and anthropogenic influence that contribute to a certain downstream nitrate concentration. For such conditions, this paper uses a methodology that can be used to inversely determine type and location of main nitrate pollutant source. The methodology builds on two state-of-the-art data mining techniques, boosted regression tree (BRT)and k-nearest neighbor (KNN). These techniques are used to produce a nitrate pollution vulnerability map. The methodology can mitigate effects of subjective judgement on determining importance of different sources and mechanisms for nitrate transport. The investigated mechanisms are hydrogeological, hydrological, anthropogenic, topography, and soil conditioning factors. Thus, the proposed methodology is used to separate between natural processes and anthropogenic effects on nitrate pollution. To calculate the groundwater vulnerability maps, a groundwater nitrate concentration of 40 mg/L (suggested by WHO with a 20% risk margin)was selected as a general threshold for identifying polluted areas that resulted in 96 polluted wells. Non-polluted locations were selected from well data with nitrate concentration less than 15 mg/L (96 non-polluted). The models were trained on 70% polluted and 70% non-polluted site data. The remaining data, 30% polluted and 30% non-polluted sites, were used to validate the simulation results. Results showed that the BRT produced outputs with higher performance than the KNN algorithm. The final ranking results based on the BRT model showed the higher importance of hydraulic conductivity, river density, soil, slope percent, net recharge, and distance from villages, in order, relative to other factors.

KW - Boosted regression tree

KW - Data mining

KW - GIS

KW - Inverse modeling

KW - K-nearest neighbors

KW - Nitrate pollution

U2 - 10.1016/j.jclepro.2019.04.293

DO - 10.1016/j.jclepro.2019.04.293

M3 - Article

VL - 228

SP - 1248

EP - 1263

JO - Journal of Cleaner Production

T2 - Journal of Cleaner Production

JF - Journal of Cleaner Production

SN - 0959-6526

ER -