Publication:
Ensemble and stacked generalization architecture for efficient environmental modelling

datacite.subject.fosoecd::Engineering and technology::Chemical engineering
dc.contributor.authorDanny Hartanto Djarum
dc.date.accessioned2025-05-08T04:04:00Z
dc.date.available2025-05-08T04:04:00Z
dc.date.issued2023-07-01
dc.description.abstractEnvironmental pollution such as air and water pollution has resulted in a variety of health-related illnesses. As an urgent solution, a continuous air and water quality monitoring station has been proposed. However, because of the enormous investment required to build and maintain such a system, its availability is not distributed uniformly across all regions, particularly in developing countries. This has led to a significant effort by the research community to develop an accurate air and water quality prediction model. In this research, the development of the environmental model for three different case studies were discussed: River water quality index (WQI) prediction in Malaysia, particulate matter 2.5 (PM2.5 ) prediction in Malaysia and PM2.5 prediction in China. Unlike most existing studies that spent a lot of resources on improving prediction accuracy, our research analyzes the importance of optimal preprocessing pipelines on improving the performance and efficiency of the environmental model. The results shown significant improvement when utilizing optimal preprocessing structure with up to 40% better performance when compared to the existing method. The work also introduced a novel LDA-ETR hybrid model for predicting water quality index (WQI) with an impressive efficiency, boasting a training time of just 0.39 seconds. In the second part of this research, the application of different variations of neural network architecture to develop the air and water quality model was analyzed. Our study shows that by implementing feedforward artificial neural network (FANN) architecture, the environmental model has a better performance and generalization capability when dealing with varying real-world datasets. Our results also show that, unlike many existing studies that utilize the sigmoid and ReLU activation function, our implementations utilizing the exponential linear unit (ELU) activation function led to much better performance. Furthermore, in the third part of our research, we introduce a highly efficient novel stacked regression approach called reduced bayesian optimized stacked regressor (RBOSR). It is designed to improve the efficiency of PM2.5 stacked model proposed by the existing study while maintaining similar accuracy. The results reveal that the RBOSR model is substantially more efficient when compared to existing benchmarks with up to 47.4 times faster training time. Lastly, we introduced a cloud-integrated machine learning platform that allows users to easily analyze, train, evaluate and deploy environmental machine learning models using their own dataset.
dc.identifier.urihttps://erepo.usm.my/handle/123456789/21573
dc.language.isoen
dc.titleEnsemble and stacked generalization architecture for efficient environmental modelling
dc.typeResource Types::text::thesis::doctoral thesis
dspace.entity.typePublication
oairecerif.author.affiliationUniversiti Sains Malaysia
Files