Publication:
Streamflow prediction for sungai kulim, malaysia using random forest (rf) and support vector regression (svr) models

datacite.subject.fosoecd::Engineering and technology::Civil engineering
dc.contributor.authorYeoh, Kai Lun
dc.date.accessioned2025-12-03T03:01:38Z
dc.date.available2025-12-03T03:01:38Z
dc.date.issued2024-05-01
dc.description.abstractShort-term streamflow prediction is important for managing immediate risks associated with extreme and unpredictable weather events. Despite numerical models showing great capabilities in streamflow prediction, they require extensive data, fundamentals of hydrology, and calibration efforts. Conversely, data-driven models are relatively quick to model and can capture the non-linearity in the streamflow time series, without requiring knowledge about the natural catchment mechanism, thus gaining traction in recent years amidst digital evolutions. In this study, two machine learning (ML) models, namely random forest (RF) and support vector regression (SVR) were introduced for multi-step ahead streamflow predictions in the Sungai Kulim catchment which has undergone speedy urban development. The models with six different input combinations were developed and assessed, using 14 years of hydrological datasets. The results revealed that the performance of the non-parametric RF algorithm was high depending on the size of terminal nodes and input configurations. Increasing the terminal node’s size improved the accuracy of the RF model. The maximum relative improvements in RMSE and NSE were 36.9% and 60.6% respectively, considering lead time up to three hours. Introducing more correlated variables into the input makes the RF algorithm capture the dynamics in time series, resulting in higher generalization to new and unseen data. Conversely, the performance of the SVR algorithm was more dependent on the selection of kernel functions and its hyperparameters, rather than the input combinations. Next, the RF model (Nash-Sutcliffe Efficiency (NSE):0.392-0.963; root mean square error (RMSE):1.485 m3/s-5.720 m3/s) performed better than the SVR model (NSE:0.190-0.830; RMSE:3.020 m3/s-6.598 m3/s) during both validation and verification stages. Although the peak streamflow was underpredicted by both RF and SVR models at all lead times, the RF model still resulted in very good predictions (PBIAS < 10%). However, the predictions by the SVR models were unsatisfactory (PBIAS > 25%), except for the one-hour-ahead streamflow (PBIAS=5.28%). The hydrographs reproduced by the RF model had smoother crest segments, rising and recession limbs as well as closer peak values, compared with the SVR model. In brief, the overall accuracy of the ML models decreased with the increasing lead-time length. The findings of this research provide insight into the use of ML algorithms for short-term streamflow prediction in Malaysia. This supports the goals of the Sendai Framework and Sustainable Development Goals by issuing flood warnings, informing risk reduction strategies, and enhancing disaster preparedness. It fosters resilient communities, safeguards human health, promotes sustainable water management, and addresses climate change impacts, contributing to safer and more sustainable development.
dc.identifier.urihttps://erepo.usm.my/handle/123456789/23307
dc.language.isoen
dc.titleStreamflow prediction for sungai kulim, malaysia using random forest (rf) and support vector regression (svr) models
dc.typeResource Types::text::thesis::master thesis
dspace.entity.typePublication
oairecerif.author.affiliationUniversiti Sains Malaysia
Files