Hybrid Technique for Arabic Text Compression
Keywords:
text compression, multilayer model text compression, morphological analysis, word-based compression, burrows-wheeler algorithm
Abstract
Arabic content on the Internet and other digital media is increasing exponentially, and the number of Arab users of these media has multiplied by more than 20 over the past five years. There is a real need to save allocated space for this content as well as allowing more efficient usage, searching, and retrieving information operations on this content. Using techniques borrowed from other languages or general data compression techniques, ignoring the proper features of Arabic has limited success in terms of compression ratio. In this paper, we present a hybrid technique that uses the linguistic features of Arabic language to improve the compression ratio of Arabic texts. This technique works in phases. In the first phase, the text file is split into four different files using a multilayer model-based approach. In the second phase, each one of these four files is compressed using the Burrows-Wheeler compression algorithm.
Downloads
- Article PDF
- TEI XML Kaleidoscope (download in zip)* (Beta by AI)
- Lens* NISO JATS XML (Beta by AI)
- HTML Kaleidoscope* (Beta by AI)
- DBK XML Kaleidoscope (download in zip)* (Beta by AI)
- LaTeX pdf Kaleidoscope* (Beta by AI)
- EPUB Kaleidoscope* (Beta by AI)
- MD Kaleidoscope* (Beta by AI)
- FO Kaleidoscope* (Beta by AI)
- BIB Kaleidoscope* (Beta by AI)
- LaTeX Kaleidoscope* (Beta by AI)
How to Cite
Published
2015-01-15
Issue
Section
License
Copyright (c) 2015 Authors and Global Journals Private Limited
This work is licensed under a Creative Commons Attribution 4.0 International License.