Publication:
Optimisation of feature selection in high dimensional data sets using design of experiment (DOE) methods

Thumbnail Image
Date
2024-07
Authors
Nurul Huda binti Ahmad Nazli
Journal Title
Journal ISSN
Volume Title
Publisher
Research Projects
Organizational Units
Journal Issue
Abstract
In recent years, the rise in network users has paralleled the explosive growth of the Internet of Things (IoT), which has significantly simplified daily tasks such as social networking, education, and digital banking. However, this growth has also led to an increase in cybersecurity threats, making it crucial to distinguish legitimate users from hackers. Optimizing feature selection in high-dimensional datasets is essential for achieving this differentiation. This project employs Design of Experiment (DoE) methodologies to enhance feature selection in high-dimensional datasets, specifically focusing on the NSL-KDD dataset, utilizing the differential evolution (DE) algorithm. The DE algorithm operates through four stages: initialization, mutation, crossover, and selection. Matlab R2024a is used for all operations, while Minitab 2019 is employed to randomize the experiment order for design variables—crossover probability factors and scaling factors. The datasets are subsequently trained using decision trees (DT) and support vector machines (SVM). Performance assessments reveal that the DTclassifier achieves 100% accuracy in 278 seconds with five selected features, with crossover probability and mutation rate having no significant effect on accuracy. Conversely, the SVM classifier reaches 100% accuracy in 114 seconds using 15 selected features, where the mutation rate and crossover probability significantly impact the response variable. These findings indicate that feature selection optimization can substantially improve classifier performance. Nevertheless, computational time remains a challenge that requires further investigation.
Description
Keywords
Citation