Telugu Text Categorization using Language Models
Keywords:
text categorization, language dependent and independent models, k-nearest neighbors
Abstract
Document categorization has become an emerging technique in the field of research due to the abundance of documents available in digital form. In this paper we propose language dependent and independent models applicable to categorization of Telugu documents. India is a multilingual country; a provision is made for each of the Indian states to choose their own authorized language for communicating at the state level for legitimate purpose. The availability of constantly increasing amount of textual data of various Indian regional languages in electronic form has accelerated. Hence, the Classification of text documents based on languages is crucial. Telugu is the third most spoken language in India and one of the fifteen most spoken language n the world. It is the official language of the states of Telangana and Andhra Pradesh. A variant of k-nearest neighbors algorithm used for categorization process. The results obtained by the Comparisons of language dependent and independent models.
Downloads
- Article PDF
- TEI XML Kaleidoscope (download in zip)* (Beta by AI)
- Lens* NISO JATS XML (Beta by AI)
- HTML Kaleidoscope* (Beta by AI)
- DBK XML Kaleidoscope (download in zip)* (Beta by AI)
- LaTeX pdf Kaleidoscope* (Beta by AI)
- EPUB Kaleidoscope* (Beta by AI)
- MD Kaleidoscope* (Beta by AI)
- FO Kaleidoscope* (Beta by AI)
- BIB Kaleidoscope* (Beta by AI)
- LaTeX Kaleidoscope* (Beta by AI)
How to Cite
Published
2016-10-15
Issue
Section
License
Copyright (c) 2016 Authors and Global Journals Private Limited
This work is licensed under a Creative Commons Attribution 4.0 International License.