Predicting Kereh River's Water Quality: A comparative study of machine learning models
DOI:
https://doi.org/10.21834/e-bpj.v8iSI15.5097Keywords:
Water Quality, Machine Learning , Decision Tree , Random ForestAbstract
This study introduces a machine learning-based approach to forecast the water quality of the Kereh River and categorize it into 'polluted' or 'slightly polluted' classifications. This work employed three machine learning algorithms: decision tree, random forests (RF), and boosted regression tree, leveraging data spanning from 2010 to 2019. Through comparative analysis, the RF model emerged as the most efficient, boasting an accuracy of 97.30%, sensitivity of 100.00%, specificity of 94.74%, and precision of 95.00%. Notably, the RF model identified dissolved oxygen (DO) as the paramount variable influencing water quality predictions.
References
Ali Khan, M., Izhar Shah, M., Faisal Javed, M., Ijaz Khan, M., Rasheed, S., El-Shorbagy, M. A., Roshdy El-Zahar, E., & Malik, M. Y. (2022). Application of random forest for modeling of surface water salinity. Ain Shams Engineering Journal, 13(4). https://doi.org/10.1016/j.asej.2021.11.004 DOI: https://doi.org/10.1016/j.asej.2021.11.004
Alnuwaiser, M. A., Javed, M. F., Khan, M. I., Ahmed, M. W., & Galal, A. M. (2022). Support vector regression and ANN approach for predicting groundwater quality. Journal of the Indian Chemical Society, 99(7), 100538. https://doi.org/https://doi.org/10.1016/j.jics.2022.100538 DOI: https://doi.org/10.1016/j.jics.2022.100538
Behrouz, M. S., Yazdi, M. N., & Sample, D. J. (2022). Using Random Forest, a machine learning approach to predict nitrogen, phosphorus, and sediment event mean concentrations in urban runoff. Journal of Environmental Management, 317, 115412. https://doi.org/https://doi.org/10.1016/j.jenvman.2022.115412 DOI: https://doi.org/10.1016/j.jenvman.2022.115412
Blagus, R., & Lusa, L. (2013). SMOTE for high-dimensional class-imbalanced data. BMC Bioinformatics, 14. https://doi.org/10.1186/1471-2105-14-106 DOI: https://doi.org/10.1186/1471-2105-14-106
Bui, D. T., Khosravi, K., Tiefenbacher, J., Nguyen, H., & Kazakis, N. (2020). Improving prediction of water quality indices using novel hybrid machine-learning algorithms. Science of the Total Environment, 721. https://doi.org/10.1016/j.scitotenv.2020.137612 DOI: https://doi.org/10.1016/j.scitotenv.2020.137612
Dermawan, A. (2021, February 4), Main cause of Sg Kreh pollution? Pig farming activities in Kg Selamat, say NGOs,
https://www.nst.com.my/news/nation/2021/02/663027/main-cause-sg-kreh-pollution-pig-farming-activities-kg-selamat-say-ngos. (Accessed: 22 October 2022)
Elith, J., Leathwick, J.R., Hastie, T. ( 2008). A working guide to boosted regression trees. Journal of Animal Ecology 77, 802–813.. doi:10.1111/j.1365-2656.2008.01390.x DOI: https://doi.org/10.1111/j.1365-2656.2008.01390.x
Myers, K. D., Knowles, J. W., Staszak, D., Shapiro, M. D., Howard, W., Yadava, M., Rader, D. J. (2019). Precision screening for familial hypercholesterolemia: a machine learning study applied to electronic health encounter data. The Lancet Digital Health. doi:10.1016/s2589-7500(19)30150-5 DOI: https://doi.org/10.1016/S2589-7500(19)30150-5
Gasim, M. B., Al-Badaii, F., & Shuhaimi-Othman, M. (2013). Water Quality Assessment of the Semenyih River, Selangor, Malaysia. Journal of Chemistry, 2013, 871056. https://doi.org/10.1155/2013/871056 DOI: https://doi.org/10.1155/2013/871056
Gazzaz, N. M., Yusoff, M. K., Aris, A. Z., Juahir, H., & Ramli, M. F. (2012). Artificial neural network modeling of the water quality index for Kinta River (Malaysia) using water quality variables as predictors. Marine Pollution Bulletin, 64(11), 2409–2420. https://doi.org/10.1016/j.marpolbul.2012.08.005 DOI: https://doi.org/10.1016/j.marpolbul.2012.08.005
Hastie, T., Tibshirani, R., & Friedman, J. (2011). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (Springer Series in Statistics) (9780387848570): Trevor Hastie, Robert Tibshirani, Jerome Friedman: Books. In The elements of statistical learning: data mining, inference, and prediction.
Ho, T. K. (1998). The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8). https://doi.org/10.1109/34.709601 DOI: https://doi.org/10.1109/34.709601
Jeung, M., Baek, S., Beom, J., Cho, K. H., Her, Y., & Yoon, K. (2019). Evaluation of random forest and regression tree methods for estimation of mass first flush ratio in urban catchments. Journal of Hydrology, 575. https://doi.org/10.1016/j.jhydrol.2019.05.079 DOI: https://doi.org/10.1016/j.jhydrol.2019.05.079
Lee Goi, C. (2020). The river water quality before and during the Movement Control Order (MCO) in Malaysia. Case Studies in Chemical and Environmental Engineering, 2. https://doi.org/10.1016/j.cscee.2020.100027 DOI: https://doi.org/10.1016/j.cscee.2020.100027
Liao, H., & Sun, W. (2010a). Forecasting and evaluating water quality of Chao Lake based on an improved decision tree method. Procedia Environmental Sciences, 2. https://doi.org/10.1016/j.proenv.2010.10.109 DOI: https://doi.org/10.1016/j.proenv.2010.10.109
Lu, H., & Ma, X. (2020). Hybrid decision tree-based machine learning models for short-term water quality prediction. Chemosphere, 249, 126169. https://doi.org/https://doi.org/10.1016/j.chemosphere.2020.126169 DOI: https://doi.org/10.1016/j.chemosphere.2020.126169
Malek, N. H. A., Yaacob, W. F. W., Nasir, S. A. M., & Shaadan, N. (2022). Prediction of Water Quality Classification of the Kelantan River Basin, Malaysia, Using Machine Learning Techniques. Water (Switzerland), 14(7). https://doi.org/10.3390/w14071067, Ministry of Environment and Water. (2020). DOI: https://doi.org/10.3390/w14071067
Motevalli, A., Naghibi, S. A., Hashemi, H., Berndtsson, R., Pradhan, B., & Gholami, V. (2019). Inverse method using boosted regression tree and k-nearest neighbor
to quantify effects of point and non-point source nitrate pollution in groundwater. Journal of Cleaner Production 228, 1248-1263.
Shamsuddin, I.I., Othman, Z., & Sani, N.S. (2022). Water Quality Index Classification Based on Machine Learning: A Case from the Langat River Basin Model. Water. DOI: https://doi.org/10.3390/w14192939
Virro, H., Kmoch, A., Vainu, M., & Uuemaa, E. (2022). Random forest-based modeling of stream nutrients at national level in a data-scarce region. Science of The Total Environment, 840, 156613. https://doi.org/https://doi.org/10.1016/j.scitotenv.2022.156613 DOI: https://doi.org/10.1016/j.scitotenv.2022.156613
Shaziayani, W. N., Ul-Saufie, A. Z., Mutalib, S., Mohamad Noor, N., & Zainordin, N. S. (2022). Classification Prediction of PM10 Concentration Using a Tree-Based Machine Learning Approach. Atmosphere, 13(4). https://doi.org/10.3390/atmos13040538 DOI: https://doi.org/10.3390/atmos13040538
Uyun, S., & Sulistyowati, E. (2020). Feature selection for multiple water quality status: Integrated bootstrapping and SMOTE approach in imbalance classes. International Journal of Electrical and Computer Engineering, 10(4). https://doi.org/10.11591/ijece.v10i4.pp4331-4339 DOI: https://doi.org/10.11591/ijece.v10i4.pp4331-4339
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Norashikin Nasaruddin, Afida Ahmad, Shahida Farhan Zakaria, Ahmad Zia Ul-Saufie, Mohamed Syazwan Osman

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.