# Introduction

rotein and other biomedical entity name are used in various biomedical and other bioinformaticsrelated research. So we will have to work hard to identifying the entities. In text-based literature protein and other biomedical name are tagged with other text. Identifying such entities from text file is very difficult. So we will have to use any scientific approach to solve the problem. Natural language processing is a system which can be used to solve the problem. Using natural language processing we will extract the required info from a text file. We use 'GENIA tagger' database to extract the information from the pdf file and get our required biomedical name. Then we will use these names to make a relation between them and visualized them.


# II.


# Related Work

Many researches are introduced in the field of biomedical and bioinformatics by using data extraction technique. All the work is currently done by text reading system. It is not possible to get the accurate data from manually text extraction technique. As a result protein and other Biomedical entities are not possible to find out correctly. So it is huge drawback of these types of a research field. In the research we have tried to find out the protein, and gene name which is about 70%-80% correct.

Author ? ? ?: e-mails: arif25-627@diu.edu.bd, Muhammad25-631@diu.edu.bd, saifuddin25-630@diu.edu.bd III.


# Our Proposed Work

In the research we try to identify the tagging problem and find a solution related to this type of work. Our work will follow the below procedure. Pdf to text file conversion is looks complex task, but we can convert it easily with the use of the algorithm and other tools. For our project work, we will have to work on the natural language based text extraction system, which will identify which type of the data is in the text file. Moreover, we can find out the required data and another type of system approach to find that entity. The natural language based system will help us on the text file to find out the necessary information for the system fulfillment of the data. By using this system, we can find out protein and its related entity.


# VI.


# Using Genia Tagger for Data Extraction

GENIA tagger is a website that will help to find out the natural language based system for the protein name tagging from the text; we will use this website for the relevant data search. Moreover, we will use this information in the desired data analyze technique [5]. We also use these type of system for our data processing system [6].


# VII.


# Data Tagging

First we will keep the data in the text file. These data will help us in accessing the information [7] [8].

Protein name contains an acronym abbreviating the species name, e.g. Protein human growth hormone (hGH)/protein, but long-form human protein IGF-II / protein /long-form. Protein entities share common terms; there may be only one name entity that can be easily tagged. We tag such name as a protein. Long-form protein CSN subunits 4 /protein, 5,6 /long-form. Assessment of v2 the results on intercoder reliability using the revised guidelines are much better.  Retrieving Data and Save Database According to the Related Entity

We will have to save data according to the text file which we will get from GENIA tagger website. Then we will able to visualize them. 


# Data Visualization

After retrieving the data, we will analyze all the entities which are related to each other. We will categorize them according to the protein, gene, chromosome various entities. Then we will visualize protein name contains an acronym abbreviating the species name , e.g . Protein human growth hormone ( hGH ) /pro-tein , but long-form human protein IGF-II / protein /long-form . protein entities share common terms , there may be only one name entity that can be easily tagged . We tag such an entity as a protein, while the list of enti-ties together are tagged as a long-form, e.g . Long-form protein CSN subunits 4 /protein, 5, 6 /long-form . Assessment of v2 The results on inter-coder reliability using the revised guidelines are much better . We present results for F-measure 


# Named Entity Recognition Performance

Our pdf file contains lots of entity of Protein, DNA, RNA, Cell Line, and Cell Type. Genia tagger provides us the flowing the final performance on the evaluation set is as follows [12]. 


# Conclusion

Natural Language Processing is a way to find out the similar relational data from a text or document. We try find out related protein and other biomedical entity name and visualize them. Our research makes the system fruitful for the data analysis process. 


# Entity
1![Fig. 1: Tagging & Visualize Protein and Other Biomedical Entity Name IV. Conversion of PDF Files to Text File](image-2.png "Fig. 1 :")
2![Fig. 2: Conversion of Pdf to Text File](image-3.png "Fig. 2 :")
3![Fig. 3: Data Tagging in a Text File VIII. Tagging Performance with other Documents GENIA tagger performance is better than other biomedical websites. GENIA tagger is trained in Wall Street Journal corpus, PennBioIE corpus so it performs well in various types of medical data.](image-4.png "Fig. 3 :")
4![Fig. 4: Saving Related Entity to Database X.](image-5.png "Fig. 4 :")
5![Fig. 5: Searching Related Protein Name](image-6.png "Fig. 5 :")
I
II
			( ) C © 2019 Global Journals
			© 2019 Global Journals
		
		
* 
	
		Nested Named Entity Recognition
		
			JennyRoseFinkel
		
		
			ChristopherDManning
		
		
* 
	
		Recognizing Nested Named Entities in Biomedical Text
		
			BeatriceAlex
		
		
			BarryHaddow
		
		
			ClaireGrover
		
		
			June 29 -30, 2007
		
	
* 
	
		Improved Text Extraction from PDF Documents for Large-Scale Natural Language Processing
		
			J¨orgTiedemann
		
		
			2014
		
	
* 
	
		
			MatthewLease
		
		
			EugeneCharniak
		
		
	Parsing Biomedical Literature


* 
	
		Soft Computing, Computing with Words and Perceptions in System Analysis, Decision and Control
		
			FiratTekiner
		
		
			YoshimasaTsuruoka
		
		
			'Jun
		
		
			Tsujii
		
	
		Fifth International Conference on IEEE
				Famagusta, Cyprus
		
			2009. 2009
		
	
	Highly scalable Text Mining -parallel tagging application


* 
	
		Domainspecific language models and lexicons for tagging
		
			AnniRCodena Serguei
		
		
			VPakhomovbrie
		
		
			KPatrick
		
		
			HDuffybChristopher
		
		
			GChute
		
	
		Journal of Biomedical Informatics
		
			December 2005
		
	
* 
	
		A method for labeling proteins with tags at the native genomic loci in budding yeast
		
			QianWang
		
		
			HuijunXue
		
		
			SiqiLi
		
		
			YingChen
		
		
			XueleiTian
		
		
			XinXu
		
		
			WeiXiao
		
		
			YuVincentFu
		
	
		Journal pone
		
			May 1, 2017
		
	
* 
	
		Tagging methods and associated data analysis
		
			RobertJLatour
		
		
			2013
		
	
* 
	
		Comparing and combining chunkers of biomedical text
		
			NingKang
		
		
			ErikMVan Mulligenjan
		
		
			AKors
		
		
			2010
		
	
* 
	
		A concept-driven biomedical knowledge extraction and visualization framework for conceptualization of text corpora
		
			MuhammadJahiruddina
		
		
			LipikaAbulaisha
		
		
			Dey
		
		
			2010
		
	
* 
	
		Software Tool for Researching Annotations of Proteins (STRAP): Open-Source Protein Annotation Software with Data Visualization
		
			NVivek
		
		
			DavidHBhatia
		
		
			CatherineEPerlman
		
		
			MarkECostello
		
		
			Mccomb
		
	
		Journal of Biomedical Informatics
		
			December 2009
		
	
* 
	
		Improving performance of natural language processing part-of-speech tagging on clinical narratives through domain adaptation
		
			PJeffrey
		
		
			HalFerraro
		
		
			Daumé
		
		
			LScott
		
		
			Wendy W Chapman HenkDuvall
		
		
			Harkema
		
		
			JPeter
		
		
			Haug
		
	
		Journal of the American Medical Informatics Association
		
			20
			5
			1 September 2013