A Lightweight Machine Learning Framework for Urban Air Quality Prediction
DOI:
https://doi.org/10.21834/e-bpj.v11i37.7953Keywords:
Random Forest, Linear Regression, PM2.5, PM10Abstract
This study proposes a lightweight machine learning framework for short-term forecasting of PM2.5 and PM10 in Seoul, South Korea, using 2024 environmental data from 50 monitoring stations. This research compares a Random Forest regressor against a Linear Regression baseline. The Random Forest model outperformed the baseline model, achieving an R2 of 0.832 and 0.827 for PM2.5 and PM10, respectively. Importantly, the framework demonstrated excellent computational efficiency, with training times under a second and prediction execution in approximately 40 milliseconds. These results justify deployment in cities with limited infrastructure.
References
Abirami, R., & Mani, P. (2025). Air Quality Prediction: A Systematic Review Of Traditional Methods And Emerging Hybrid Frameworks. International Journal of Environmental Sciences. https://doi.org/10.64252/5msjqn05
Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32. https://doi.org/10.1023/A:1010933404324
Cao, Y., Zhang, D., Ding, S., Zhong, W., & Yan, C. (2024). A Hybrid Air Quality Prediction Model Based on Empirical Mode Decomposition. Tsinghua Science and Technology. https://doi.org/10.26599/tst.2022.9010060
Castelli, M., Clemente, F. M., Popovič, A., Silva, S., & Vanneschi, L. (2020). A machine learning approach to predict air quality in California. Complexity, 2020, 8049504. https://doi.org/10.1155/2020/8049504
Chen, B., & Kan, H. (2008). Air pollution and population health: a global challenge. Environmental Health and Preventive Medicine, 13, 94-101. https://doi.org/10.1007/s12199-007-0018-5
Dong, J., Zhang, Y., & Hu, J. (2024). Short-term air quality prediction based on EMD-transformer-BiLSTM. Scientific Reports, 14. https://doi.org/10.1038/s41598-024-
-1
Harishkumar, K. S., & Yogesh, K. M. (2020). Forecasting air pollution particulate matter (PM2. 5) using machine learning regression models. Procedia Computer Science, 171, 2057-2066. https://doi.org/10.1016/j.procs.2020.04.221
Huang, X. (2023). The Impact of PM10 and Other Airborne Particulate Matter on the Cardiopulmonary and Respiratory Systems of Sports Personnel under Atmospheric Exposure. Atmosphere. https://doi.org/10.3390/atmos14111697
Hunter, J. D. (2007). Matplotlib: A 2D graphics environment. Computing in science & engineering, 9(3), 90-95. https://doi.org/10.1109/MCSE.2007.55
Jayamurugan, R., Kumaravel, B., Palanivelraja, S., & Chockalingam, M. P. (2013). Influence of temperature, relative humidity and seasonal variability on ambient air quality in a coastal urban area. International Journal of Atmospheric Sciences, 2013(1), 264046. https://doi.org/10.1155/2013/264046
Lachin, J. M. (2016). Fallacies of last observation carried forward analyses. Clinical Trials, 13(2), 161–168. https://doi.org/10.1177/1740774515602688
Lee, H. J. (2013). The influence of trans-boundary air pollutants from neighboring countries on the PM air quality in Korea (Doctoral dissertation, Seoul National University). https://s-space.snu.ac.kr/handle/10371/121194
McKinney, W. (2010). Data structures for statistical computing in Python. Proceedings of the 9th Python in Science Conference. 445, 51–56.https://doi.org/10.25080/Majora-92bf1922-00a
Montgomery, D. C., Peck, E. A., & Vining, G. G. (2021). Introduction to linear regression analysis (6th ed.). John Wiley & Sons.
Open-Meteo. (2024). Historical weather data. Retrieved from https://open-meteo.com/
Patro, S., & Sahu, K. K. (2015). Normalization: A preprocessing stage. arXiv preprint arXiv:1503.06462. https://doi.org/10.48550/arXiv.1503.06462
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... & Duchesnay, É. (2011). Scikit-learn: Machine learning in Python. the Journal of machine Learning research, 12, 2825-2830. https://doi.org/10.48550/arXiv.1201.0490
Rybarczyk, Y., & Zalakeviciute, R. (2018). Machine learning approaches for outdoor air quality modelling: A systematic review. Applied Sciences, 8(12), 2570. https://doi.org/10.3390/app8122570
Seoul Open Data Plaza. (2024). Daily air quality monitoring data. Seoul Metropolitan Government. Retrieved from https://data.seoul.go.kr/
Thangavel, P., Park, D., & Lee, Y. (2022). Recent Insights into Particulate Matter (PM2.5)-Mediated Toxicity in Humans: An Overview. International Journal of
Environmental Research and Public Health, 19. https://doi.org/10.3390/ijerph19127511
Wang, S., Cheng, Y., Meng, Q., Saukh, O., Zhang, J., Fan, J., Zhang, Y., Yuan, X., & Thiele, L. (2025). PCDCNet: A Surrogate Model for Air Quality Forecasting with
Physical-Chemical Dynamics and Constraints. ArXiv, abs/2505.19842. https://doi.org/10.48550/arxiv.2505.19842
Wang, Y., Du, Y., Wang, J., & Li, T. (2019). Calibration of a low-cost PM2. 5 monitor using a random forest model. Environment international, 133, 105161. https://doi.org/10.1016/j.envint.2019.105161
Xu, Y., Ho, H. C., Wong, M. S., Deng, C., Shi, Y., Chan, T. C., & Knudby, A. (2018). Evaluation of machine learning techniques with multiple remote sensing datasets in estimating monthly concentrations of ground-level PM2.5. Environmental Pollution, 242, 1417–1426. https://doi.org/10.1016/j.envpol.2018.08.029
Yan, R., Liao, J., Yang, J., Sun, W., Nong, M., & Li, F. (2021). Multi-hour and multi-site air quality index forecasting in Beijing using CNN, LSTM, CNN-LSTM, and spatiotemporal clustering. Expert Systems with Applications, 169, 114513. https://doi.org/10.1016/j.eswa.2020.114513
Zhai, B., & Chen, J. (2018). Development of a stacked ensemble model for forecasting and analyzing daily average PM2.5 concentrations in Beijing, China. Science of the Total Environment, 635, 644–658. https://doi.org/10.1016/j.scitotenv.2018.04.040
Zhang, K., Bhandari, K. S., & Cho, G. (2023). TB-RPL: A try-the-best fused mode of operation to enhance point-to-point communication performance in RPL. Electronics, 12(7), 1639.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Akshovya Shrestha, Thien Phu Nguyen, Khadak Singh Bhandari, Ahmed Abdulhakim Al-Absi

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.