Publication:
Optimisation of feature selection in high dimensional data sets using design of experiment (DOE) methods

datacite.subject.fosoecd::Engineering and technology::Electrical engineering, Electronic engineering, Information engineering::Electrical and electronic engineering
dc.contributor.authorNurul Huda binti Ahmad Nazli
dc.date.accessioned2025-05-20T03:00:12Z
dc.date.available2025-05-20T03:00:12Z
dc.date.issued2024-07
dc.description.abstractIn recent years, the rise in network users has paralleled the explosive growth of the Internet of Things (IoT), which has significantly simplified daily tasks such as social networking, education, and digital banking. However, this growth has also led to an increase in cybersecurity threats, making it crucial to distinguish legitimate users from hackers. Optimizing feature selection in high-dimensional datasets is essential for achieving this differentiation. This project employs Design of Experiment (DoE) methodologies to enhance feature selection in high-dimensional datasets, specifically focusing on the NSL-KDD dataset, utilizing the differential evolution (DE) algorithm. The DE algorithm operates through four stages: initialization, mutation, crossover, and selection. Matlab R2024a is used for all operations, while Minitab 2019 is employed to randomize the experiment order for design variables—crossover probability factors and scaling factors. The datasets are subsequently trained using decision trees (DT) and support vector machines (SVM). Performance assessments reveal that the DTclassifier achieves 100% accuracy in 278 seconds with five selected features, with crossover probability and mutation rate having no significant effect on accuracy. Conversely, the SVM classifier reaches 100% accuracy in 114 seconds using 15 selected features, where the mutation rate and crossover probability significantly impact the response variable. These findings indicate that feature selection optimization can substantially improve classifier performance. Nevertheless, computational time remains a challenge that requires further investigation.
dc.identifier.urihttps://erepo.usm.my/handle/123456789/21742
dc.language.isoen
dc.titleOptimisation of feature selection in high dimensional data sets using design of experiment (DOE) methods
dc.typeResource Types::text::report::technical report
dspace.entity.typePublication
oairecerif.author.affiliationUniversiti Sains Malaysia
Files