Development of an efficient and merging of numerous small files algorithm for the hadoop distributed file system

Adnan Ali

Publication:
Development of an efficient and merging of numerous small files algorithm for the hadoop distributed file system

Date

2023-10-01

Authors

Adnan Ali

Abstract

In the era of Big Data, many sources and environments generate large amounts of data. This large amount of data requires sophisticated tools and specialized procedures that can evaluate the information and predict the results for future changes. Hadoop is used to process this type of data. It is known to handle large amounts of data more efficiently than small amounts, which results in inefficiencies in the framework. This study proposes a new solution to this problem by applying the Enhanced Best Fit Merging algorithm (EBFM) that merges files based on predefined parameters (type and size). The implementation of this algorithm will ensure that the maximum number of blocks and the size of the generated file are in the same range. Its main goal is to dynamically combine files with criteria that have been specified based on the type of file to guarantee the effectiveness and efficiency of the established system. This procedure occurs before the file is processed by the Hadoop framework. In addition, the files produced by the system are named with certain keywords to ensure that there is no data loss (file overwriting). The proposed EBFM guarantees the generation of the least number of files possible, reduces the input/output memory load and is in line with the effectiveness of the Hadoop framework. The results of the study show that the proposed EBFM improves the performance of the framework by about 64% compared to all other potential performance variables. The proposed EBFM can be implemented in any environment that uses the Hadoop framework, including smart cities and real-time data analysis.

URI

https://erepo.usm.my/handle/123456789/21538

Collections

Pusat Pengajian Kejuruteraaan Elektrik dan Elektronik - Tesis

Full item page

Publication:
Development of an efficient and merging of numerous small files algorithm for the hadoop distributed file system

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Research Projects

Organizational Units

Journal Issue

Abstract

Description

Keywords

Citation

URI

Collections

Publication: Development of an efficient and merging of numerous small files algorithm for the hadoop distributed file system

Options

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Research Projects

Organizational Units

Journal Issue

Abstract

Description

Keywords

Citation

URI

Collections

Publication:
Development of an efficient and merging of numerous small files algorithm for the hadoop distributed file system