Process modelling for prediction of air quality using pm2.5 in multivariable systems

Loading...
Thumbnail Image
Date
2019-06
Authors
Nur Hidanah Binti Anuar
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Pollutant particulate matter PM2.5 concentration having diameter of 2.5 microns gives bad impact to the environment. The purpose of this study was to predict concentration of PM2.5 using multiple linear regression, principal component regression and neural network method. In Malaysia, currently the concentration of PM2.5 have not been considered in the air pollution index due to non-existence of PM2.5 monitoring station. With the prediction model developed in this study, the concentration of PM2.5 can be predicted using meteorological variables. Each model developed were tested with three types of data structure that are having past values and exogenous data as input for cities of Beijing, Chengdu, Guangzhou, Shenyang and Shanghai. The performance of prediction model was analysed and evaluated using root mean square error (RMSE) and coefficient of determination (R2) values. The neural network model exhibited strong correlation between actual and predicted concentration of PM2.5 compared to multiple linear regression and principal component regression for all cites. Increasing number of neurons in network generate lower RMSE and higher R2 values. The best performance was achieved using neural network with 10 neurons with R2 value of 0.973 and RMSE value of 0.228. To compare between different data structure used, the model using larger number of past input and output as input generates higher R2 (Beijing: 0.966; Chengdu: 0.977; Guangzhou: 0.930; Shenyang: 0.970; Shanghai: 0.981) and lower RMSE values (Beijing: 0.080; Chengdu: 0.044; Guangzhou: 0.136; Shenyang: 0.063; Shanghai: 0.360). Using previous values of output and exogenous data for forecasting allow better fit between predicted and actual values thus, giving more accurate prediction. In conclusion, the most effective prediction model was neural network model using 10 hidden layers and data arrangement using past output values and exogenous data.
Description
Keywords
Citation