A Lightweight Machine Learning Framework for Urban Air Quality Prediction
Abstract
Abstract
Air pollution has become one of the most serious public health problems in the 21st century. Part of the reason this is happening is the rapid growth of cities around the world. Public sources are making air quality monitoring easily accessible, but high-precision short-term forecasts are rare. For this reason, communities find it hard to adapt to behavior in response to changes in pollution levels. Most forecasting systems today rely on legacy statistical methods for air quality prediction, but due to the highly non-linear and volatile nature of urban pollutants, these methods sometimes fail to deliver accurate predictions. In this study, a machine learning framework for real-time prediction of air quality indicators, such as PM2.5, PM10, CO, and NO₂, using wind speed and humidity as supplementary environmental variables that represent regional climatic conditions. We used two models and compared: Linear Regression as a statistical baseline and Random Forest as the main predictive model. The dataset was sourced from the Seoul Open Data Plaza website, which is a publicly available government-operated data portal managed by the Seoul Metropolitan Government. For data pre-processing, we use methods such as Min-Max normalization and handling missing values to ensure the data is correct and the model functions more reliably. To measure how well our model works, we use MAE, RMSE, and R². In the experiment, we gave equal weights to prediction accuracy and speed. The key differentiating factor of this framework is that it works on a low-resource configuration. This system is designed to leverage low-computing systems, focusing on high-accuracy short-term predictions, unlike other methods that rely on large and expensive computing infrastructure. Making it easy to use in many cities across Asia and Africa, where growth is rapid, and resources are limited. The experimental outcomes of the proposed framework show that it surpasses conventional statistical techniques in elucidating complex temporal pollution patterns. The goal of this work is to turn the predictive outputs into useful real-world information, helping local governments provide timely health advisories and enabling locals to make smart behavior choices based on real-time air quality conditions and ultimately leading to a healthier and more sustainable urban environment.
Keywords: Machine learning; Lightweight; Air quality; Prediction
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Akshovya Shrestha, Thien Phu Nguyen, Khadak Singh Bhandari, Ahmed Abdulhakim Al-Absi

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.