# Introduction n computer forensic process is impacted by large amount of data. This has roughlydistinct asrestraints that mergeelement of law and computer sciences to gather and examine information from computer systems. In our study there are hundreds of files are there in instructed format. For this analysis they have some methods like machine learning and data mining are of great importance. Clustering algorithms are usually needed to grouping data in files, where there is practically no prior knowledge about the information [1] [13]. From a more specialized perspective, our datasets comprise of unlabeled objects. In addition, actually expecting that named datasets could be accessible from previous analysis, there is very nearly no hope that the same classes would be still legitimate for the upcoming information, got from different computers also related to different examinations. More definitely, it is likely that the new information would come from different locations. In this way, the utilization of clustering algorithms, which are fit for discovering latent patterns from content documents found in seized computers, can improve the analysis performed by the expert examiner. The methods of rational clustering algorithm objects within a substantial group are more like one another than they are two objects belongs to alternative group [1]. Along those data partition has been actuated from data. The export examiner may concentrate on interesting on delegated documents from the obtained set of groups by performing this task of examination of all documents. In a more functional and sensible situations, domain experts are rare and have limited time accessible for performing examinations. Therefore it is sensible to expect that finding a significant document. The examiner could prioritized the investigation of different documents belongs to the cluster interest. Clustering algorithm has been mulled over for a considerable length of time and the literature on the subject is huge. Therefore, we decided to demonstrate the capability of the proposed methodology, namely: the partition algorithms K-Means, K-Medoids, the hierarchical single link, complete link, average link and the cluster ensemble algorithm known as CSPA [3].It is well known that the number of clusters demonstrating parameter of many algorithms and it is generally having an earlier knowledge. However the number of clusters has not been examined in the computer forensics. Really we could not even spot one work that is sensibly close in its application area and that reports the utilization of number of algorithms capable finding the number of clusters [3]. # II. # Review of Related Research In our software development process research is the most important one. In this is based on the time factors, economy and company strength we can determine the developing process. Once the programmer start the work based on experts suggestions and gather related information to different websites based on their work. Before building the system each and developer can maintain the above requirements report. C. M. Fung et al. [7] have present an agglomerative and divisive hierarchical clustering to group the documents into clusters, cluster in a documents have high similarity to each other, the dissimilar documents into other group. Likewise no labeled documents are provided in clustering. Hence these are known as unsupervised learning. Hence the clusters are analyzed into a tree that facilitates browsing. The parent-child relationship among the nodes in the tree can be viewed as a topic-subtopic relationship. B. Fei et al. [3] have discusses the application of a self-organizing map (SOM) is to support decision making by computer forensic investigators and assist them in conducting data analysis in a more efficient manner and also SOM produces patterns similarity in data sets. Author explores great ability to interpret, explore data generated by computer forensic tools. Alexander Strehlet al. [2] hasintroduces three effective and efficient techniques to obtaining high quality combiners. In first combiner induce Partitioning and re-clustering of objects is based on similar measure, second is based on hyper-graph and third is based on collapse group of clusters into meta-clusters which participate to find individual object to the combined clustering. By using the three approaches to provide the low computational costs and feasible to use a supra-consensus function against the objective function and provide the best results. L. F. Nassif et al. [5] have present an approach that applies document clustering algorithms to forensic analysis of computers seized in police examinations. Author represents experimentation with six well-known clustering calculations (K-Means, K-medoids, Single Link, Complete Link, Average Link, and CSPA) applied to five certifiable datasets acquired from computers seized in true examinations. Investigations have been performed with different combination of parameters, resulting in 16 different instantiations of algorithms. Moreover, two relative legitimacy records were utilized to consequently appraise the quantity of clusters. In the event that suitably introduced, partition algorithms (Kmeans and K-medoids) can likewise respect great results. Ying Zhao et al. [8] have use high quality clustering algorithms play an essential part in giving intuitive navigation and browsing mechanism by sorting large amount of data into a little number of meaningful clusters. Specifically clustering algorithm, hierarchical clustering that assemble meaningful hierarchies out of large amount of accumulations. This all concentrates on document clustering algorithm that manufactures such various leveled solutions. # a) Hierarchical Agglomerative Algorithm Hierarchical clustering algorithms are either topdown or bottom up. Bottom-up algorithms treat each one record as a single cluster at the beginning and after that progressively merge (or agglomerate) sets of clusters until all groups have been merged into a single group that contains all documents. Bottom # i. Cluster Formation By passing up from the bottom to top process of clusters the dendrograms used to reproduce the historical environment of consolidations that brought about the represented clustering. For example, we see that the two documents permittedAddressBook_Deletecopy.html and AddressBook_New_Action.html were combining in Figure (1) and that the last combine added Visitors.html to a cluster comprising of the other 27 documents. ii. Finding Similarity The Hierarchical Agglomerative clustering is usually defined as a dendrograms as illustrate in Fig ( 1). Each combination is signified by a horizontal line. That line represents similarity between two documents, where documents are viewed as single clusters. We call this similarity the combination similarity of the combined documents. For example, the combination similarity of the documents Schedule_New.html and ToDos_Index.html consisting of Figure ( 1) is ?0.19 . This defines the cosine similarity of clusters as 1.0. iii. # Framework Requirements For finding the meaningful data from the dataset, researchers have used data mining techniques, in which clustering is one of the popular techniques. Let DS will taking as our dataset represented as DS={d 1 d 2 , ? d n }; 1? I ? n, where n is the number documents in a dataset DS. In our propose system basically there are three important steps which are as follows 1) Preprocessing 2) Cluster Formulation 3) Forensic analysis # Preprocessing In preprocessing step there are three steps such as a) fetch a file contents, b) Stemming, c) Stop word Removal. These 3 steps are used to remove the noise and inconsistent data. In first step fetch the dataset and perform the second operation with the help of porter stemming. In this stemming is based on the idea that the suffixes in the English language are mostly made up of a combination of smaller and simpler suffixes. If the words end with ed, ing, ly etc that words are removed. This step is a linear step stemmer [16]. In this last step is remove the stop words with the help of Stop token filter.[17] Stop words in a document like to, I, has, the, be, or etc. stop words are the foremost frequent words with in the English language. Stop words blot your index while not providing any additional worth. # Global Journal of C omp uter S cience and T echnology Volume XV Issue V Version I Year 2015 ( ) C frequent words with in the English language. Stop words blot your index while not providing any additional worth. At that point, we received a customary statistical methodology for text mining, in which documents are meant in a vector space model. In this each one model, each one document is denoted by vector containing the frequencies of events of words. To process the distance between reports, two measures have been utilized specifically: cosine-based separation and hierarchical agglomerative clustering. After these steps our data will be relevant. # Cluster Formulation This session exhibits the mining of datasets from the preprocessed dataset. For each document the similarity of the concentrated words from the preprocessed step is processed and the top comparability documents are clustered first this. This session depicts the mining of successive item sets from the preprocessed content documents. For each document the recurrence of the concentrated words from the preprocessing step is registered and the top continuous words from each are taking out. From the set of top frequent words, the binary database is framed by getting the unique words. # a) Hierarchical Agglomerative Clustering Algorithm Hierarchical agglomerative algorithms treat every one document as a singleton cluster toward the starting and thereafter dynamically consolidation set of clusters until all clusters have been melded into a single cluster that contains all documents. Input: List of Documents D=d 1 ,d 2 ? d n Output: Clusters resultC= {c 1 , c 2 ?c n } 1. For i=1 to n do 2. For the given list of documents each document is treated as a specified 3. Finding parsers //those are theunique words in documents 4. Suppress non-dictionary words 5. Get unique edges in this documents 6. Initialize clusters a. For n?1 to N b. Applying clustering to the items Constructing histogram //for analyzing clusters h min should be 1.0; h max should be 0.0 A New Approach for Improving Computer Inspections by using Fuzzy Methods for Forensic Data Analysis For T to 1?n-2 For J to 1?n-1 t sim =sim(doc[t], doc[j]) {If(h min >t sim ) h min =t sim ; If (h max