A Lightweight Machine Learning Framework for Urban Air Quality Prediction

Authors

Abstract

This study proposes a lightweight machine learning framework for short-term forecasting of PM2.5 and PM10 in Seoul, South Korea, using 2024 environmental data from 50 monitoring stations. This research compares a Random Forest regressor against a Linear Regression baseline. The Random Forest model outperformed the baseline model, achieving an R2 of 0.832 and 0.827 for PM2.5 and PM10, respectively. Importantly, the framework demonstrated excellent computational efficiency, with training times under a second and prediction execution by 39.67 milliseconds. These results justify deployment in cities with limited infrastructure.

Keywords: Random Forest, Linear Regression, PM2.5, PM10.

References

Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32. https://doi.org/10.1023/A:1010933404324

Cao, Y., Zhang, D., Ding, S., Zhong, W., & Yan, C. (2024). A Hybrid Air Quality Prediction Model Based on Empirical Mode Decomposition. Tsinghua Science and Technology. https://doi.org/10.26599/tst.2022.9010060

Chen, B., & Kan, H. (2008). Air pollution and population health: a global challenge. Environmental Health and Preventive Medicine, 13, 94-101. https://doi.org/10.1007/s12199-007-0018-5

Dong, J., Zhang, Y., & Hu, J. (2024). Short-term air quality prediction based on EMD-transformer-BiLSTM. Scientific Reports, 14. https://doi.org/10.1038/s41598-024-67626-1

Harishkumar, K. S., & Yogesh, K. M. (2020). Forecasting air pollution particulate matter (PM2. 5) using machine learning regression models. Procedia Computer Science, 171, 2057-2066. https://doi.org/10.1016/j.procs.2020.04.221

Huang, X. (2023). The Impact of PM10 and Other Airborne Particulate Matter on the Cardiopulmonary and Respiratory Systems of Sports Personnel under Atmospheric Exposure. Atmosphere. https://doi.org/10.3390/atmos14111697

Hunter, J. D. (2007). Matplotlib: A 2D graphics environment. Computing in science & engineering, 9(3), 90-95. doi.org/10.1109/MCSE.2007.55

Jayamurugan, R., Kumaravel, B., Palanivelraja, S., & Chockalingam, M. P. (2013). Influence of temperature, relative humidity and seasonal variability on ambient air quality in a coastal urban area. International Journal of Atmospheric Sciences, 2013(1), 264046. https://doi.org/10.1155/2013/264046

John M Lachin (2016). Fallacies of last observation carried forward analyses. https://doi.org/10.1177/1740774515602688

Lee. (2013). (The) influence of trans-boundary air pollutants from neighboring countries on the PM air quality in Korea (Doctoral dissertation, SNU). https://s-space.snu.ac.kr/handle/10371/121194

McKinney, W. (2010). Data structures for statistical computing in Python. Proceedings of the 9th Python in Science Conference, 445, 51-56. https://doi.org/10.25080/Majora-92bf1922-00a

Montgomery, D. C., Peck, E. A., & Vining, G. G. (2021). Introduction to linear regression analysis. John Wiley & Sons. https://www.kwcsangli.in/uploads/3--Introduction_to_Linear_Regression_Analysis__5th_ed._Douglas_C._Montgomery__Elizabeth_A.Peck__and_G..pdf

Open Meteo (2024) https://open-meteo.com/

Patro, S. G. O. P. A. L., & Sahu, K. K. (2015). Normalization: A preprocessing stage. arXiv preprint arXiv:1503.06462. https://doi.org/10.48550/arXiv.1503.06462

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... & Duchesnay, É. (2011). Scikit-learn: Machine learning in Python. the Journal of machine Learning research, 12, 2825-2830. https://doi.org/10.48550/arXiv.1201.0490

R, A., & P, M. (2025). Air Quality Prediction: A Systematic Review Of Traditional Methods And Emerging Hybrid Frameworks. International Journal of Environmental Sciences. https://doi.org/10.64252/5msjqn05

Seoul Open Data Plaza (2024) https://data.seoul.go.kr/

Thangavel, P., Park, D., & Lee, Y. (2022). Recent Insights into Particulate Matter (PM2.5)-Mediated Toxicity in Humans: An Overview. International Journal of Environmental Research and Public Health, 19. https://doi.org/10.3390/ijerph19127511

Wang, S., Cheng, Y., Meng, Q., Saukh, O., Zhang, J., Fan, J., Zhang, Y., Yuan, X., & Thiele, L. (2025). PCDCNet: A Surrogate Model for Air Quality Forecasting with Physical-Chemical Dynamics and Constraints. ArXiv, abs/2505.19842. https://doi.org/10.48550/arxiv.2505.19842

Wang, Y., Du, Y., Wang, J., & Li, T. (2019). Calibration of a low-cost PM2. 5 monitor using a random forest model. Environment international, 133, 105161. https://doi.org/10.1016/j.envint.2019.105161

Yan, R., Liao, J., Yang, J., Sun, W., Nong, M., & Li, F. (2021). Multi-hour and multi-site air quality index forecasting in Beijing using CNN, LSTM, CNN-LSTM, and spatiotemporal clustering. Expert Systems with Applications, 169, 114513. https://doi.org/10.1016/j.eswa.2020.114513

Zhang, K., Bhandari, K. S., & Cho, G. (2023). TB-RPL: A try-the-best fused mode of operation to enhance point-to-point communication performance in RPL. Electronics, 12(7), 1639.

Downloads

Published

2026-05-06

How to Cite

Shrestha, A., Nguyen , T. P., Bhandari, K. S., & Al-Absi, A. A. (2026). A Lightweight Machine Learning Framework for Urban Air Quality Prediction . Environment-Behaviour Proceedings Journal, 11(37). Retrieved from https://ebpj.e-iph.co.uk/index.php/EBProceedings/article/view/7910