Predicting Kereh River's Water Quality: A comparative study of machine learning models

Authors

  • Norashikin Nasaruddin Faculty of Computer and Mathematical Sciences, University Teknologi MARA, Kedah Branch, 08400 Merbok, Kedah, Malaysia
  • Afida Ahmad Faculty of Computer and Mathematical Sciences, University Teknologi MARA, Kedah Branch, 08400 Merbok, Kedah, Malaysia
  • Shahida Farhan Zakaria Faculty of Computer and Mathematical Sciences, University Teknologi MARA, Kedah Branch, 08400 Merbok, Kedah, Malaysia
  • Ahmad Zia Ul-Saufie Faculty of Computer and Mathematical Sciences, University Teknologi MARA, Shah Alam 40450, Selangor, Malaysia
  • Mohamed Syazwan Osman Faculty of Chemical Engineering, University Teknologi MARA, Penang Branch, 14300 Penang, Malaysia.

DOI:

https://doi.org/10.21834/e-bpj.v8iSI15.5097

Keywords:

Water Quality, Machine Learning , Decision Tree , Random Forest

Abstract

This study introduces a machine learning-based approach to forecast the water quality of the Kereh River and categorize it into 'polluted' or 'slightly polluted' classifications. This work employed three machine learning algorithms: decision tree, random forests (RF), and boosted regression tree, leveraging data spanning from 2010 to 2019. Through comparative analysis, the RF model emerged as the most efficient, boasting an accuracy of 97.30%, sensitivity of 100.00%, specificity of 94.74%, and precision of 95.00%. Notably, the RF model identified dissolved oxygen (DO) as the paramount variable influencing water quality predictions.

References

Ali Khan, M., Izhar Shah, M., Faisal Javed, M., Ijaz Khan, M., Rasheed, S., El-Shorbagy, M. A., Roshdy El-Zahar, E., & Malik, M. Y. (2022). Application of random forest for modeling of surface water salinity. Ain Shams Engineering Journal, 13(4). https://doi.org/10.1016/j.asej.2021.11.004 DOI: https://doi.org/10.1016/j.asej.2021.11.004

Alnuwaiser, M. A., Javed, M. F., Khan, M. I., Ahmed, M. W., & Galal, A. M. (2022). Support vector regression and ANN approach for predicting groundwater quality. Journal of the Indian Chemical Society, 99(7), 100538. https://doi.org/https://doi.org/10.1016/j.jics.2022.100538 DOI: https://doi.org/10.1016/j.jics.2022.100538

Behrouz, M. S., Yazdi, M. N., & Sample, D. J. (2022). Using Random Forest, a machine learning approach to predict nitrogen, phosphorus, and sediment event mean concentrations in urban runoff. Journal of Environmental Management, 317, 115412. https://doi.org/https://doi.org/10.1016/j.jenvman.2022.115412 DOI: https://doi.org/10.1016/j.jenvman.2022.115412

Blagus, R., & Lusa, L. (2013). SMOTE for high-dimensional class-imbalanced data. BMC Bioinformatics, 14. https://doi.org/10.1186/1471-2105-14-106 DOI: https://doi.org/10.1186/1471-2105-14-106

Bui, D. T., Khosravi, K., Tiefenbacher, J., Nguyen, H., & Kazakis, N. (2020). Improving prediction of water quality indices using novel hybrid machine-learning algorithms. Science of the Total Environment, 721. https://doi.org/10.1016/j.scitotenv.2020.137612 DOI: https://doi.org/10.1016/j.scitotenv.2020.137612

Dermawan, A. (2021, February 4), Main cause of Sg Kreh pollution? Pig farming activities in Kg Selamat, say NGOs,

https://www.nst.com.my/news/nation/2021/02/663027/main-cause-sg-kreh-pollution-pig-farming-activities-kg-selamat-say-ngos. (Accessed: 22 October 2022)

Elith, J., Leathwick, J.R., Hastie, T. ( 2008). A working guide to boosted regression trees. Journal of Animal Ecology 77, 802–813.. doi:10.1111/j.1365-2656.2008.01390.x DOI: https://doi.org/10.1111/j.1365-2656.2008.01390.x

Myers, K. D., Knowles, J. W., Staszak, D., Shapiro, M. D., Howard, W., Yadava, M., Rader, D. J. (2019). Precision screening for familial hypercholesterolemia: a machine learning study applied to electronic health encounter data. The Lancet Digital Health. doi:10.1016/s2589-7500(19)30150-5 DOI: https://doi.org/10.1016/S2589-7500(19)30150-5

Gasim, M. B., Al-Badaii, F., & Shuhaimi-Othman, M. (2013). Water Quality Assessment of the Semenyih River, Selangor, Malaysia. Journal of Chemistry, 2013, 871056. https://doi.org/10.1155/2013/871056 DOI: https://doi.org/10.1155/2013/871056

Gazzaz, N. M., Yusoff, M. K., Aris, A. Z., Juahir, H., & Ramli, M. F. (2012). Artificial neural network modeling of the water quality index for Kinta River (Malaysia) using water quality variables as predictors. Marine Pollution Bulletin, 64(11), 2409–2420. https://doi.org/10.1016/j.marpolbul.2012.08.005 DOI: https://doi.org/10.1016/j.marpolbul.2012.08.005

Hastie, T., Tibshirani, R., & Friedman, J. (2011). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (Springer Series in Statistics) (9780387848570): Trevor Hastie, Robert Tibshirani, Jerome Friedman: Books. In The elements of statistical learning: data mining, inference, and prediction.

Ho, T. K. (1998). The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8). https://doi.org/10.1109/34.709601 DOI: https://doi.org/10.1109/34.709601

Jeung, M., Baek, S., Beom, J., Cho, K. H., Her, Y., & Yoon, K. (2019). Evaluation of random forest and regression tree methods for estimation of mass first flush ratio in urban catchments. Journal of Hydrology, 575. https://doi.org/10.1016/j.jhydrol.2019.05.079 DOI: https://doi.org/10.1016/j.jhydrol.2019.05.079

Lee Goi, C. (2020). The river water quality before and during the Movement Control Order (MCO) in Malaysia. Case Studies in Chemical and Environmental Engineering, 2. https://doi.org/10.1016/j.cscee.2020.100027 DOI: https://doi.org/10.1016/j.cscee.2020.100027

Liao, H., & Sun, W. (2010a). Forecasting and evaluating water quality of Chao Lake based on an improved decision tree method. Procedia Environmental Sciences, 2. https://doi.org/10.1016/j.proenv.2010.10.109 DOI: https://doi.org/10.1016/j.proenv.2010.10.109

Lu, H., & Ma, X. (2020). Hybrid decision tree-based machine learning models for short-term water quality prediction. Chemosphere, 249, 126169. https://doi.org/https://doi.org/10.1016/j.chemosphere.2020.126169 DOI: https://doi.org/10.1016/j.chemosphere.2020.126169

Malek, N. H. A., Yaacob, W. F. W., Nasir, S. A. M., & Shaadan, N. (2022). Prediction of Water Quality Classification of the Kelantan River Basin, Malaysia, Using Machine Learning Techniques. Water (Switzerland), 14(7). https://doi.org/10.3390/w14071067, Ministry of Environment and Water. (2020). DOI: https://doi.org/10.3390/w14071067

Motevalli, A., Naghibi, S. A., Hashemi, H., Berndtsson, R., Pradhan, B., & Gholami, V. (2019). Inverse method using boosted regression tree and k-nearest neighbor

to quantify effects of point and non-point source nitrate pollution in groundwater. Journal of Cleaner Production 228, 1248-1263.

Shamsuddin, I.I., Othman, Z., & Sani, N.S. (2022). Water Quality Index Classification Based on Machine Learning: A Case from the Langat River Basin Model. Water. DOI: https://doi.org/10.3390/w14192939

Virro, H., Kmoch, A., Vainu, M., & Uuemaa, E. (2022). Random forest-based modeling of stream nutrients at national level in a data-scarce region. Science of The Total Environment, 840, 156613. https://doi.org/https://doi.org/10.1016/j.scitotenv.2022.156613 DOI: https://doi.org/10.1016/j.scitotenv.2022.156613

Shaziayani, W. N., Ul-Saufie, A. Z., Mutalib, S., Mohamad Noor, N., & Zainordin, N. S. (2022). Classification Prediction of PM10 Concentration Using a Tree-Based Machine Learning Approach. Atmosphere, 13(4). https://doi.org/10.3390/atmos13040538 DOI: https://doi.org/10.3390/atmos13040538

Uyun, S., & Sulistyowati, E. (2020). Feature selection for multiple water quality status: Integrated bootstrapping and SMOTE approach in imbalance classes. International Journal of Electrical and Computer Engineering, 10(4). https://doi.org/10.11591/ijece.v10i4.pp4331-4339 DOI: https://doi.org/10.11591/ijece.v10i4.pp4331-4339

Downloads

Published

2023-09-19

How to Cite

Nasaruddin, N., Ahmad, A., Zakaria, S. F., Ul-Saufie, A. Z., & Osman, M. S. (2023). Predicting Kereh River’s Water Quality: A comparative study of machine learning models. Environment-Behaviour Proceedings Journal, 8(SI15), 213–219. https://doi.org/10.21834/e-bpj.v8iSI15.5097