# INTRODUCTION icroarrays, widely recognized as the next revolution in molecular biology, enable scientists to analyze genes, proteins and other biological molecules on a genomic scale [1]. A microarray is a collection of spots containing DNA deposited on the solid surface of glass slide. Each of the spot contains multiple copies of single DNA sequence [2]. Microarray expression technology helps in the monitoring of gene expression for tens and thousands of genes in parallel. During the biological experiment, the mRNA of two biological tissues of interest is extracted and purified. Each of the mRNA samples are reverse transcribed into complementary DNA (cDNA) copy and labeled with two different fluorescent dyes resulting in two fluorescence-tagged cDNA (red Cy5 and green Cy3). The tagged cDNA copies, called the sample probe, are hybridized with the slide's DNA spots. The hybridized glass slides are fluorescently scanned at different wavelengths (corresponding to the different dyes used), and two digital images are produced, one for each population of mRNA. Each digital image contains a number of spots of various fluorescence intensities. The intensity of each spot is proportional to the hybridization level of the cDNAs and Author ? ? ? : BVC Engineering College, Odalarevu. the DNA dots, the gene expression information is obtained by analyzing the digital images [3]. The processing of the microarray images usually consists of the following three steps: (i) gridding, which is the process of assigning the location of each spot in the image. (ii) Segmentation, which is the process of grouping the pixels with similar features and (iii) Intensity extraction, which calculates red and green foreground intensity pairs and background intensities. Nowadays, segmentation algorithms such as Kmeans and Fuzzy C-Means have been used for the segmentation of spots of the microarray images. In this paper, we present a histogram clustering algorithm for segmentation of spots of the microarray image. The proposed algorithm is based on the minimization of the mutual information loss, where now the input variable represents the histogram bins and the output is given by the set of regions obtained from the split and merge algorithm. The rest of the paper is organized as follows. Section II presents K-Means Algorithm, Section III presents Fuzzy C-Means Algorithm, Section IV presents present Histogram Clustering algorithm for segmentation of spots in Microarray image, Section V presents experimental results and finally Section VI reports conclusion. # II. K-MEANS CLUSTERING ALGORITHM K-means is one of the basic methods in clustering introduced by Hartigan et al. in 1979 [3]. This method is applied to microarray image segmentation in recent years [21]. K-means clustering algorithm implemented in this paper aims to group the pixels into two clusters. Given x = {x 1 ,x 2 ,...,x N } and c = {c 1 , .. c j } representing the pixels of microarray image and clusters respectively, the objective is to minimize the sum of squares of the distances given by the following: d ij = || x i -c j ||. arg min ? ? = = C j N i 1 1 d ij 2 (1) First two cluster centers c 1 and c 2 , the centroid of spots and background have to be initialized at the outset. Iteratively, the pixels are assigned to the closest cluster and the new centroid of a cluster is calculated by the following: The k-means algorithm to segment microarray image is summarized as below: u ij =1(2) For all i= 1,2,??.N, where c is the number of clusters and N is the number of pixels in microarray image. Step_2: Compute the centroid values for each cluster c j . Each pixel should have a degree of membership to those designated clusters. So the goal is to find the membership values of pixels belonging to each cluster. The algorithm is an iterative optimization that minimizes the cost function defined as follows: F= ? ? = = c i N j 1 1 u ij m || x j -c i || 2 (3) Where u ij represents the membership of pixel x j in the i th cluster and m is the fuzziness parameter. Step_3: Compute the updated membership values u ij belonging to clusters for each pixel and cluster centroids according to the given formula. End. # IV. HISTOGRAM CLUSTERING ALGORITHM We present a greedy histogram clustering algorithm that takes as input partitioned image and obtain histogram clustering based on the minimization of the loss of Mutual Information. The Mutual Information between two random variables X and Y is defined by # I(X,Y)=H(X)-H(X|Y) Where H(X)= -? ?X x p(x)logp(x) and H(X|Y)= -? ?X x p(x) ? ?Y y p(y|x)logp(y|x)(5) That is we group the bins of the histogram so that the mutual Information is maximally preserved. From the perspective of the information bottleneck method the binning process is controlled by a given partition of the image. The histogram clustering algorithm is presented in [9]. Our Clustering algorithm is based on the channel G?R, and is defines by the conditional probability matrix p(R|G) which expresses how the pixels corresponding to each histogram bin are distributed into regions of the image . Bayes' theorem, expressed by p(g)p(r|g)=p(r)p(g|r), establishes the relationship between the conditional probabilities of both channels G?R and R?G. The basic idea underlying our histogram clustering algorithm is to capture the maximum information of the image with the minimum number of histogram bins. In general, if the two bins are very similar the channel can be simplified by substituting these two bins by their clustering, without a significant loss of information. The algorithm proceeds by merging the two bins so that the loss of information is minimum. During the clustering process H(R)=H(R|G) + I(G,R), where H(R) is the entropy of p( R) and H(R|G) and I(G,R) represent, respectively, the successive values of conditional entropy and MI obtained after successful clusterings. Observe also that H(R|G) is the average entropy of the bins and increases at each iteration. # EXPERIMENTAL RESULTS Segmentation steps of the microarray image processing are performed on a sample microarray slide that has 48 blocks, each block consisting of 110 spots. A sample block has been chosen and 108 spots of the block have been cropped for simplicity. The sample image is a 154*200 pixel image that consists of a total of 30800 pixels. The RGB colored image microarray image have been converted to grayscale image to specify a single intensity value that varies from the darkest (0) to the brightest (255) for each pixel shown in figure1. # CONCLUSION Histogram clustering algorithm constitutes a valid tool to segment the spots of microarray image. Even though the mathematical bases for these techniques are complex, their implementation is simple, quick and easier on the user. The proposed segmentation algorithm has the advantage of processing spots of variable shapes and being insensitive to variations. In order to process the images of low intensity background correction is necessary. The proposed algorithm provides a more efficient way of segmenting the microarray image when compared with the segmentation achieved by K-Means and Fuzzy c-Means. 2011![Global Journals Inc. (US) Global Journal of Computer Science and Technology Volume XI Issue XIX Version I 31 2011 November Algorithm KM(x,n,c) Input: N=number of pixels to be clustered; x = {x 1 ,x 2 ,...,x N } pixels of microarray image; c=2: foreground and background clusters; Output: cl: cluster of pixels Begin Step_1: Cluster centroids are initialized, Step_2: Compute the closest cluster for each pixel and classify it to that cluster, Step_3: Compute new centroids after all the pixels are clustered, Step_4: Repeat the Steps 2-3 till the sum of squares given in Equation End. III. FUZZY C-MEANS CLUSTERING Algorithm Fuzzy C-Means(x,n,c,m) Input: N=number of pixels to be clustered; x = {x1,x2 ,...,xN}: pixels of microarray image; c=2: foreground and background clusters; m=2: the fuzziness parameter; Output: u: membership values of pixels Begin Step_1: Initialize the membership matrix u ij is a value in (0,1) and the fuzziness parameter m. The sum of all membership values of a pixel belonging to clusters should satisfy the constraint expressed in the following.](image-2.png "M © 2011") 4![Step_4: Repeat steps 2-3 until the cost function is minimized.](image-3.png "( 4 )") ![Fig1 : a) RGB Color microarray image b) Grayscale Image The segmented microarray image using three different segmentation algorithms (K-means, Fuzzy c-Means and Histogram Clustering algorithm) is shown in figure 2.](image-4.png "Fig1") ![Fig2 : a) K-means b) Fuzzy c-means c) Histogram Clustering Algorithm The histogram gives the distribution of intensity values for each cluster. The K-means have calculated mean of the spots as 25.32 and the mean of the](image-5.png "Fig2") © 2011 Global Journals Inc. (US) Global Journal of Computer Science and Technology Volume XI Issue XIX Version I * Quantitative Monitoring of gene expression patterns with a complementary DNA microarray MSchena DShalon RonaldWDavis PatrickOBrown Science 270 199 * An Automated Gridding and Segmentation method for cDNA Microarray Image Analysis Wei-BangChen ChengcuiZhang Wen-LinLiu 19 th IEEE Symposium on Computer-Based Medical Systems * Error Reduction on Automatic Segmentation in Microarray Image Tsung-Han Tsai Chein-PoYang Pin-HuaWei-Chitsai Chen 2007 IEEE * Analysis of microarray imagesusing FCM and kmeans Clustering Algorithm EErguit YYardimci EMumcuoglu OKonu Proc IJCI IJCI 2003 * Ihsan Omur Bucak, Clustering based Spot Segmentation of cDNA Microarray Images Volkan Uslan IEEE 2010 * CRafael RichardEGongalez Woods Digital Image Processing Pearson Education Third Edition * Grey-Scale Morphology Based on Fuzzy Logic TDeng HHeijmans Journal of Mathematical Imaging and Vision 16 2 2002 Springer * Application of Fuzzy Morphology to Contrast Enhancement MAWirth DNikitento 2005 IEEE * An Information Theoretic Framework for image segmentation JRigau MFeixas MSbert IEEE 2004