Optimisation of feature selection in high dimensional data sets using design of experiment (DOE) methods

Nurul Huda binti Ahmad Nazli

Publication:
Optimisation of feature selection in high dimensional data sets using design of experiment (DOE) methods

Date

2024-07

Authors

Nurul Huda binti Ahmad Nazli

Abstract

In recent years, the rise in network users has paralleled the explosive growth of the Internet of Things (IoT), which has significantly simplified daily tasks such as social networking, education, and digital banking. However, this growth has also led to an increase in cybersecurity threats, making it crucial to distinguish legitimate users from hackers. Optimizing feature selection in high-dimensional datasets is essential for achieving this differentiation. This project employs Design of Experiment (DoE) methodologies to enhance feature selection in high-dimensional datasets, specifically focusing on the NSL-KDD dataset, utilizing the differential evolution (DE) algorithm. The DE algorithm operates through four stages: initialization, mutation, crossover, and selection. Matlab R2024a is used for all operations, while Minitab 2019 is employed to randomize the experiment order for design variables—crossover probability factors and scaling factors. The datasets are subsequently trained using decision trees (DT) and support vector machines (SVM). Performance assessments reveal that the DTclassifier achieves 100% accuracy in 278 seconds with five selected features, with crossover probability and mutation rate having no significant effect on accuracy. Conversely, the SVM classifier reaches 100% accuracy in 114 seconds using 15 selected features, where the mutation rate and crossover probability significantly impact the response variable. These findings indicate that feature selection optimization can substantially improve classifier performance. Nevertheless, computational time remains a challenge that requires further investigation.

URI

https://erepo.usm.my/handle/123456789/21742

Collections

Pusat Pengajian Kejuruteraaan Elektrik dan Elektronik - Monograf

Full item page

Publication:
Optimisation of feature selection in high dimensional data sets using design of experiment (DOE) methods

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Research Projects

Organizational Units

Journal Issue

Abstract

Description

Keywords

Citation

URI

Collections

Publication: Optimisation of feature selection in high dimensional data sets using design of experiment (DOE) methods

Options

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Research Projects

Organizational Units

Journal Issue

Abstract

Description

Keywords

Citation

URI

Collections

Publication:
Optimisation of feature selection in high dimensional data sets using design of experiment (DOE) methods