Machine Learning-Based Binary Classification of Air Quality Using Pollutant Concentrations

Authors

  • Raya Basem Mahmood Author

Keywords:

Air Quality Classification, Machine Learning, Logistic Regression, Random Forest, XGBoost, Binary Classification, Environmental Monitoring, Pollutant Prediction

Abstract

The primary means to protect the health of people and aid in making environmental decisions is air quality monitoring. This study creates an air quality machine learning model that analyses the concentration of pollutants to form binary air quality models. The target variable was formulated by the researchers as the categories of the original AQI_Bucket were converted into two classes: Safe (0) and Unhealthy (1). The preprocessing pipeline addressed the missing data by using numerical data processing, scaling algorithms, and converting categorical data to other forms. The authors developed three new characteristics PM_ratio, NO ratio and Gas Total that enhanced their ability to quantify the association of pollutants with each other. The researchers developed training and testing sets, using an 80:20 data split, to evaluate three machine learning algorithms, namely, Logistic Regression, Random Forest, and XGBoost. The team was able to assess the performance of the models using six measures: Accuracy, Precision, Recall, F1-score, ROC-AUC and confusion matrixes. The results demonstrated that ensemble models provided better performance results than Logistic Regression. The accuracy of Logistic Regression was 0.887 and the AUC of 0.957, whereas the accuracy of Random Forest was 0.918 and its AUC was 0.976. XGBoost produced the best overall performance with an accuracy of 0.919, precision of 0.938, recall of 0.930, F1-score of 0.934 and AUC of 0.975. The study findings are valid in showing that ensemble learning methods are effective in binary air quality classification and simplify environmental monitoring activities.

References

[1] R. Janarthanan, P. Partheeban, K. Somasundaram, and P. NavinElamparithi, "A deep learning approach for prediction of air quality index in a metropolitan city," Sustainable Cities and Society, vol. 67, p. 102720, 2021.

[2] D. Kothandaraman, S. K. Lakshmanaprabu, S. Mohanty, and A. Shankar, "Intelligent air quality forecasting using machine learning techniques," Journal of Environmental Management, vol. 265, p. 110555, 2020.

[3] P. Gupta, A. Singh, and R. K. Sharma, "Comparative analysis of machine learning algorithms for air quality prediction," Procedia Computer Science, vol. 167, pp. 209-216, 2020.

[4] F. Aram, A. Garcia, E. Solgi, and S. Mansournia, "Urban air pollution monitoring using machine learning techniques," Sustainable Cities and Society, vol. 60, p. 102199, 2020.

[5] Y. Ozupak, M. Cakmak, and A. Yilmaz, "Air quality prediction using ensemble learning methods," Environmental Monitoring and Assessment, vol. 193, no. 6, pp. 1-12, 2021.

[6] H. Liu, Y. Tian, Y. Li, and L. Zhang, "A stacking ensemble learning approach for air quality prediction," Atmospheric Pollution Research, vol. 11, no. 4, pp. 738-745, 2020.

Downloads

Published

2026-04-21

Issue

Section

Articles