# INTRODUCTION uman immunodeficiency virus (HIV) is a retrovirus (a virus whose genetic information is contained in Ribonucleic acid instead Deoxyribonucleic acid) that causes Acquired Immune Deficiency Syndrome (AIDS) by infecting helper T cells or Lymphocyte (cells that defense the body at foreign bodies) of the immune system. Antigen are substances that stimuli the production of antibody. A serotype or serovar is a group of bacteria that share a characteristic set of antigen. The most common serotype or serovar, HIV-1, is distributed worldwide, while HIV-2 is primarily confined to West Africa (Healthline, 2011). AIDS is a severe immunological disorder caused by the retrovirus HIV, resulting in a defect in cell-mediated immune response that is manifested by increased susceptibility to opportunistic infections and to certain rare cancers, especially Kaposi's sarcoma (Healthline and MedicineNet, 2011). It is transmitted primarily by exposure to contaminated body fluids, especially blood and semen. Other means include sharing contaminated sharp objects and blood transfusion. Everybody who has AIDS also has HIV, but not everybody with HIV is classified by the United States (U.S.) government as having AIDS. The U.S. government uses CD4 cell counts (part of the immune system) to make this distinction (Healthline, 2011). The earliest known case of HIV-1 came from a human blood sample collected in 1959 from a man in Kinshasa, Democratic Republic of Congo (healthline, 2011). The method by which he became infected is not known; however, genetic analysis of his blood sample suggested that HIV-1 might have stemmed from a single virus in the late 1940s or early 1950s. HIV has existed in the United States since the mid to late 1970s. During 1979 to 1981, rare types of pneumonia, cancer, and other illnesses were reported by physicians in Los Angeles and New York among a number of male patients who had sex with other men. Los Angeles and New York among a number of male patients who had sex with other men. Since it is rare to find these diseases in people with a healthy immune system, public health representatives became concerned that a new virus was emerging (Healthline, 2011). In 1982, the term AIDS was introduced to describe the occurrences of opportunistic infections, Kaposi sarcoma, and pneumonia (Pneumocystis carinii ) in previously healthy persons and formal tracking of these cases in the United States began that year. The virus that causes AIDS was discovered in 1983 and named human or helper T-cell (lymphotropic) virus-type III/ lymphadenopathy associated virus (HTLV-III/LAV) by an international scientific committee who later changed it to HIV (Healthline, 2011 andMedicineNet, 2011). Many theories as to the origins of HIV and how it appeared in the human population have been suggested. The majority of scientists believed that HIV originated in other primates and was somehow transmitted to man. In 1999, an international group reported the discovery of the origins of HIV-1, the predominant strain of HIV in the developed world (Healthline, 2011). A subspecies of chimpanzees native to west equatorial Africa were identified as the original source of the virus. The researchers believe that HIV-1 was introduced into the human population when hunters became exposed to infected blood (Healthline, 2011;MedicineNet, 2011 and WrongDiagnosis). Most scientists believe that HIV causes AIDS by directly inducing the death of CD4+ T cells (helper T person's immune response is disrupted during HIV infection, impairing a person's ability to fight other infections. The HIV-mediated destruction of the lymph nodes and related immunologic organs also plays a major role in causing the immunosuppressant seen in persons with AIDS (Healthline, 2011). In the absence of antiretroviral therapy, the median time from HIV infection to the development of AIDS-related symptoms has been approximately 10 to 12 years (Healthline, 2011 andWrongDiagnosis, 2011). A wide variation in disease progression, however, has been noted. Approximately 10 percent of HIV-infected persons have progressed to AIDS within the first two to three years after infection, whereas up to 5 percent of persons have stable CD4+ Tcell counts and no symptoms even after 12 or more years (Healthline, 2011). Factors such as age or genetic differences among persons with HIV, the level of virulence of an individual strain of virus, and co-infection with other microbes may influence the rate and severity of disease progression. Drugs that fight the infections associated with AIDS have improved and prolonged the lives of HIV-infected persons by preventing or treating conditions such as Journal of Infectious Diseases. This approach is known formally as short-cycle structured intermittent antiretroviral therapy (SIT) or colloquially as the "7-7" approach (Healthline, 2011). HIV symptoms can include: headache, chronic cough, diarrhea, swollen glands, lack of energy, loss of appetite, weight loss, frequent fevers, frequent yeast infections, skin rashes, pelvic/abdominal cramps, sores on certain parts of your body and short-term memory loss (MedicineNet, 2011). Existing methods of medical diagnosis employed by physicians for the analysis of HIV infection uses manual methods characterized by the inability to handle uncertain or vague data existing between intervals. More so, those systems are not self-learning or adaptive in nature. This paper has chosen to solve these problems by employing the rich facilities of fuzzy cluster means. The proposed system which is self-learning and adaptive, is a time-capsule (a cache of information) to be preserved for ages to medical engineers for the diagnosis and analysis of HIV infection. # II. # LITERATURE REVIEW Cluster analysis is a statistical techniques used to classify objects into coherent categories based on a set of measurement, indicators or variables. A common use of cluster analysis in medicine is to categorize patients into subgroups or diagnostic categories based upon patterns of clinical signs and symptoms, in this case HIV infection (Brian et al., 2001). Two-way clustering techniques are frequently used to organize genes into groups or clusters with similar levels of expression across relevant subgroup of patient's tissues, sample or cell lines (Eisen et al., 1998) In practice, a cluster analysis is the product of a series of analytical decisions. The analytical decisions made at each point in the series can significantly affect subsequent decisions, as well as the overall result of a cluster analysis (Everitt et al., 2003). This series of analytic decisions typically involves choices about what objects to cluster, unit of measurement to use for the variables, proximity measure and criteria for determining the number and quality of clusters within the data. Likert scale is the most popular psychological measurement schemes that depend on human judgment. This scaling scheme assures that the human observer is good at quantitative observation and assignment of number or objects to reflect degrees of traits or statement being measured (Cartwrigh, 2003 andChow, 2002). In this scoring scheme, subjects are asked to choose exactly one alternative that describe their substance (Yuan, 2008). However, this scheme disregard with human thinking as multi-valued, transitional and analogue, but rather clear-cut (precise) and digital. The invention and application of Fuzzy Cluster Means (FCM) algorithm in pattern recognition allows entities (objects) to belong to many clusters or categories with different degrees of membership (Yiouyang et al.; 2007). In this paper a framework for partitioning, which proposes a model of how data are generated from a cluster structure is presented. The Fuzzy Logic and Neural networks of personnel performance within organizations has been studied with a view of evaluating them for productivity and promotion (Akinyokun and Uzoka, 2004). The application of Fuzzy C-means (FCM) algorithm to medical diagnostic expert systems is presented in (Albayrak andAmasyali, 2003 andBerk et al., 2000). This algorithm is used in assigning patients to different cluster of disease. The application of fuzzy C-means in clustering has been demonstrated in (Yang and Wang, 2001; De Fazio and Galeazzi, 2004 and Jantzen , 1998). In this paper, fuzzy C-means algorithm is used to assign patients with HIV conditions to clusters of HIV. # Overview of Fuzzy C-Means Clustering (FCM) The FCM algorithm is one of the most widely used fuzzy clustering algorithms. The FCM algorithm attempts to partition a finite collection of elements X={X 1 , X 2 ,...,X n } into a collection of c fuzzy clusters with respect to some given criterion. Given a finite set of data, the algorithm returns a list of c cluster centers V, such that V =V i, i=1, 2,..., c and a partition matrix U such that U = U ij , i =1,..., c , j =1,..., n # 2011 where U ij is a numerical value in [0, 1] that tells the degree to which the element X j belongs to the i-th cluster. The fuzzy logic linguistic description of the typical FCM algorithm is presented in Figure 1 Start Step 1: Select the number of clusters c (2?c?n), exponential weight ? (1