Inverse method using boosted regression tree and k-nearest neighbor to quantify effects of point and non-point source nitrate pollution in groundwater

Research output: Contribution to journalArticle


title = "Inverse method using boosted regression tree and k-nearest neighbor to quantify effects of point and non-point source nitrate pollution in groundwater",
abstract = "Nitrate pollution of groundwater has increased dramatically worldwide due to increase of population and agricultural productivity. The resulting nitrate concentration in groundwater is usually a combination of various types of point and non-point pollutant sources. It is often difficult to distinguish between these sources since groundwater is formed in large and complex catchments with various natural processes and anthropogenic influence that contribute to a certain downstream nitrate concentration. For such conditions, this paper uses a methodology that can be used to inversely determine type and location of main nitrate pollutant source. The methodology builds on two state-of-the-art data mining techniques, boosted regression tree (BRT)and k-nearest neighbor (KNN). These techniques are used to produce a nitrate pollution vulnerability map. The methodology can mitigate effects of subjective judgement on determining importance of different sources and mechanisms for nitrate transport. The investigated mechanisms are hydrogeological, hydrological, anthropogenic, topography, and soil conditioning factors. Thus, the proposed methodology is used to separate between natural processes and anthropogenic effects on nitrate pollution. To calculate the groundwater vulnerability maps, a groundwater nitrate concentration of 40 mg/L (suggested by WHO with a 20{\%} risk margin)was selected as a general threshold for identifying polluted areas that resulted in 96 polluted wells. Non-polluted locations were selected from well data with nitrate concentration less than 15 mg/L (96 non-polluted). The models were trained on 70{\%} polluted and 70{\%} non-polluted site data. The remaining data, 30{\%} polluted and 30{\%} non-polluted sites, were used to validate the simulation results. Results showed that the BRT produced outputs with higher performance than the KNN algorithm. The final ranking results based on the BRT model showed the higher importance of hydraulic conductivity, river density, soil, slope percent, net recharge, and distance from villages, in order, relative to other factors.",
keywords = "Boosted regression tree, Data mining, GIS, Inverse modeling, K-nearest neighbors, Nitrate pollution",
author = "Alireza Motevalli and Naghibi, {Seyed Amir} and Hossein Hashemi and Ronny Berndtsson and Biswajeet Pradhan and Vahid Gholami",
year = "2019",
doi = "10.1016/j.jclepro.2019.04.293",
language = "English",
volume = "228",
pages = "1248--1263",
journal = "Journal of Cleaner Production",
issn = "0959-6526",
publisher = "Elsevier",