A Lightweight Machine Learning Framework for Urban Air Quality Prediction
Abstract
This study proposes a lightweight machine learning framework for short-term forecasting of PM2.5 and PM10 in Seoul, South Korea, using 2024 environmental data from 50 monitoring stations. This research compares a Random Forest regressor against a Linear Regression baseline. The Random Forest model outperformed the baseline model, achieving an R2 of 0.832 and 0.827 for PM2.5 and PM10, respectively. Importantly, the framework demonstrated excellent computational efficiency, with training times under a second and prediction execution by 39.67 milliseconds. These results justify deployment in cities with limited infrastructure.
Keywords: Random Forest, Linear Regression, PM2.5, PM10.
References
Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32. https://doi.org/10.1023/A:1010933404324
Cao, Y., Zhang, D., Ding, S., Zhong, W., & Yan, C. (2024). A Hybrid Air Quality Prediction Model Based on Empirical Mode Decomposition. Tsinghua Science and Technology. https://doi.org/10.26599/tst.2022.9010060
Chen, B., & Kan, H. (2008). Air pollution and population health: a global challenge. Environmental Health and Preventive Medicine, 13, 94-101. https://doi.org/10.1007/s12199-007-0018-5
Dong, J., Zhang, Y., & Hu, J. (2024). Short-term air quality prediction based on EMD-transformer-BiLSTM. Scientific Reports, 14. https://doi.org/10.1038/s41598-024-67626-1
Harishkumar, K. S., & Yogesh, K. M. (2020). Forecasting air pollution particulate matter (PM2. 5) using machine learning regression models. Procedia Computer Science, 171, 2057-2066. https://doi.org/10.1016/j.procs.2020.04.221
Huang, X. (2023). The Impact of PM10 and Other Airborne Particulate Matter on the Cardiopulmonary and Respiratory Systems of Sports Personnel under Atmospheric Exposure. Atmosphere. https://doi.org/10.3390/atmos14111697
Hunter, J. D. (2007). Matplotlib: A 2D graphics environment. Computing in science & engineering, 9(3), 90-95. doi.org/10.1109/MCSE.2007.55
Jayamurugan, R., Kumaravel, B., Palanivelraja, S., & Chockalingam, M. P. (2013). Influence of temperature, relative humidity and seasonal variability on ambient air quality in a coastal urban area. International Journal of Atmospheric Sciences, 2013(1), 264046. https://doi.org/10.1155/2013/264046
John M Lachin (2016). Fallacies of last observation carried forward analyses. https://doi.org/10.1177/1740774515602688
Lee. (2013). (The) influence of trans-boundary air pollutants from neighboring countries on the PM air quality in Korea (Doctoral dissertation, SNU). https://s-space.snu.ac.kr/handle/10371/121194
McKinney, W. (2010). Data structures for statistical computing in Python. Proceedings of the 9th Python in Science Conference, 445, 51-56. https://doi.org/10.25080/Majora-92bf1922-00a
Montgomery, D. C., Peck, E. A., & Vining, G. G. (2021). Introduction to linear regression analysis. John Wiley & Sons. https://www.kwcsangli.in/uploads/3--Introduction_to_Linear_Regression_Analysis__5th_ed._Douglas_C._Montgomery__Elizabeth_A.Peck__and_G..pdf
Open Meteo (2024) https://open-meteo.com/
Patro, S. G. O. P. A. L., & Sahu, K. K. (2015). Normalization: A preprocessing stage. arXiv preprint arXiv:1503.06462. https://doi.org/10.48550/arXiv.1503.06462
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... & Duchesnay, É. (2011). Scikit-learn: Machine learning in Python. the Journal of machine Learning research, 12, 2825-2830. https://doi.org/10.48550/arXiv.1201.0490
R, A., & P, M. (2025). Air Quality Prediction: A Systematic Review Of Traditional Methods And Emerging Hybrid Frameworks. International Journal of Environmental Sciences. https://doi.org/10.64252/5msjqn05
Seoul Open Data Plaza (2024) https://data.seoul.go.kr/
Thangavel, P., Park, D., & Lee, Y. (2022). Recent Insights into Particulate Matter (PM2.5)-Mediated Toxicity in Humans: An Overview. International Journal of Environmental Research and Public Health, 19. https://doi.org/10.3390/ijerph19127511
Wang, S., Cheng, Y., Meng, Q., Saukh, O., Zhang, J., Fan, J., Zhang, Y., Yuan, X., & Thiele, L. (2025). PCDCNet: A Surrogate Model for Air Quality Forecasting with Physical-Chemical Dynamics and Constraints. ArXiv, abs/2505.19842. https://doi.org/10.48550/arxiv.2505.19842
Wang, Y., Du, Y., Wang, J., & Li, T. (2019). Calibration of a low-cost PM2. 5 monitor using a random forest model. Environment international, 133, 105161. https://doi.org/10.1016/j.envint.2019.105161
Yan, R., Liao, J., Yang, J., Sun, W., Nong, M., & Li, F. (2021). Multi-hour and multi-site air quality index forecasting in Beijing using CNN, LSTM, CNN-LSTM, and spatiotemporal clustering. Expert Systems with Applications, 169, 114513. https://doi.org/10.1016/j.eswa.2020.114513
Zhang, K., Bhandari, K. S., & Cho, G. (2023). TB-RPL: A try-the-best fused mode of operation to enhance point-to-point communication performance in RPL. Electronics, 12(7), 1639.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Akshovya Shrestha, Thien Phu Nguyen , Khadak Singh Bhandari, Ahmed Abdulhakim Al-Absi

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.