# Introduction tring matching diversely used in many areas of computer sciences. It has been one of the prominent issues of information retrieval system. Some standard algorithms have been used for processing text files against patterns, for example in manipulation of text, text compression, network analysis and also in data retrieval systems. The algorithms studied in the present character forms the basic components in its software implementation and also serve as a model in fields of computer science like system design purposes, web search engines, computer virus signature matching and networking [1]. Rapid growth of abundant information makes necessary to have efficient methods for information retrieval. Coping with the growth of the web and query traffic requires scalable information retrieval systems. Today commercial search engines are fully automatic and their web index on a few data centers [2]. It is tedious task to come up with scalable indexing and query processing techniques for next generation IRS in the coming future. The web comprises wide variety of content in the form of structured Meta data, databases, maps, images, videos and textual documents etc. [1]. The main challenge of present IRS may be scalability. A recent trends envisions that the number of servers required by a search engines to keep up with the load in 2010 may be in the order of millions as such the text size is increasing to tens of billions of pages [1,2]. Hence it is very urgent to design a truly distributed large scale systems that enables fast and accurate search over very large amount of content [15]. In this paper we mainly focus on pattern matching on distributed environment called as hypercube network model using RMI method [14]. Given a pattern may be more common or more specific, we wish to count how many times it occurs in the text and to point out its occurrence positions. For pattern matching we used Kumar Viswanadha-KMP and Kumar Viswanadha-Boyer Moore string matching algorithms for different text files against different pattern files [2,3]. Basically Boyer -Moore algorithm is works based on two heuristics: bad character heuristics and good suffix heuristics. The text files is partitioned and processed in two ways, one is non-overlapping and second is overlapping text partitioned processing [2]. In both the cases KV-KMP and KV-BM are applied for string matching (pattern matching) and the remote server will be invoked using JAVA RMI method on hypercube networked model to reduce the search time. The paper is organized as follows section II deals with literature survey of string matching in parallel environment, section III deals with text processing techniques called as overlapping and non-overlapping text partitioned in divide and conquer paradigm. Section IV explains about the hypercube model networked systems. Section V presents experimental setup. Result analysis and discussions were discussed in section VI and VII is conclusion. Section VIII gives the references. # II. # LITERAURE SURVEY Large amounts of data and textual information has been continuously increasing in many fields of information systems. Rapid growth of abundant information makes necessary to have efficient methods for information retrieval [1]. This chapter will give an idea how far the algorithms have helped in achieving the desired information along with its time complexities. String matching scan be accomplished by designing algorithms in two categories namely, exact string Year matching algorithms that locates exact match of the pattern in the text string or source string and approximate string matching algorithms that finds closest possible match of pattern in the text with some mismatches. Exact string matching problem can be addressed in two ways software based approach or hardware based approach. Software based algorithms are slow on comparison with hardware based [22]. Hardware based solutions to string matching provides efficient data storage and fast matching [12]. String matching application in network intrusion detection systems require that the matching shall be accomplished at wire speed , software based solutions could not afford this, due to which hardware based algorithms are chosen mostly over software grounded algorithms. The algorithms discussed below addresses mostly exact string matching. Text represents an important form of data involving a lot of operations [6]. Pattern matching is one of the problems encountered in text manipulation. It is about searching and locating substring within a sequence of characters in a raw text. # Software Based String Matching Algorithms In 1972, Cook exhibited string matching using two way push down auto meta and solved pattern matching in O(m+n) time in worst case where m and n are the lengths of text and pattern respectively [19]. However in 1977 Rivest determined that every string matching algorithm must go through at least n-m+1 comparison at worst. This shows there is no solution of obtaining a sub linear n worst time in solving the issue. It means the time needed to run an algorithm must be a function of its input size. The next algorithm discussed makes an attempt to achieve a sub-linear matching time. Donald Knuth-Voughan Pratt-James H. Morrris (1977) basing on modifications of Cook's theorem came up with a new string matching algorithm popularly known as KMP Algorithm, briefly discussed below [3]. It is the first linear pattern matching algorithm discovered with a run time of O(m+n). i. Knuth-Morris-Pratt Algorithm Knuth Morris Pratt's string matching algorithm employs exact string matching technique with linear time complexity [3]. It involves pattern preprocessing .It scans the text string from left to right for pattern matching, while scanning the text it stores the information about the matched characters and whenever a mismatch occurs it uses this information to avoid unnecessary comparisons by sometimes shifting more than one position. It thus avoids backtracking and reduces the number of comparisons unlike naïve approach which wastes the scan information. The present algorithm uses a sliding window which slides over text string and makes shifts as per mismatches. It smartly shifts the pattern over text than the brute-force approach. Window shift uses a KMP formulated prefix function obtained by preprocessing pattern to reduce unnecessary comparisons. The algorithm uses this function to decide about the number of characters to be skipped while shifting the window whenever a mismatch takes place. ii. Aho-Corasick Algorithm Unix fgrep command implementation is based on Aho-Corasick algorithm which locates finite and fixed set of strings in a file and outputs the lines containing at least one of the strings [8]. Consider a dictionary (X) containing a fixed set of strings and a text denoted by Y. Let k be the number of strings present in X. Suppose if we wish to find all the occurrences of all the strings of a dictionary. The simple solution would be to repeatedly implement few string matching algorithms on each string .The time complexity of this operation will be O(m+n*k), where m is the sum of the lengths of the k strings of dictionary X and n is the length of the text Y. This indicates the inefficiency of this approach as the text has to be read for k times. This problem is addressed by Aho-Corasick algorithm discussed below, it undergoes sequential read of the text and run time would be O(m+n).The present algorithm extends the weaker versions of Knuth-Morris-Pratt algorithm and also fastly matches a number of patterns at one time against a single text [3]. This algorithm locates all the occurrences of finite number of keywords in a string of text. It involves construction of a finite state pattern matching machine, an automaton and then uses the machine for text processing in a single. A keyword is a finite set of strings denoted by K= y 1 , y 2 ,?y n and let X be an arbitrary text string. Pattern matching machine employs three functions namely, goto function g, failure function f, and output function output. iii. Boyer Moore Algorithm B ob Boyer and J. Strother Moore discovered this algorithm in the year 1977 which is known as one of the most efficient algorithms and also stands as a benchmark for string matching process [6]. The algorithm compares pattern string within a sliding window over a text string, employing right to left scan of characters inside the window where as the window slides from left to right over the text. The aim of this algorithm is to avoid certain fragments of text that are not eligible for comparison. This decision is taken by placing the window in left alignment with text. The algorithm starts comparing the pattern characters with the text characters in the order of right to left. If 'm' being the length of pattern (x), the algorithm compares x m =y m , where 'y' symbolizes text. On true result of this comparison the procedure continues with x m-1 =y m-1 and on the occurrence of false, the algorithm makes two ways out. One is named as bad character shift or occurrence shift and the other is called as good suffix shift or better factor shift or sometimes matching shift. On grounds of these two measures the window makes shifts and locates the pattern. These measures are explained in the following paragraphs with an example demonstration. # Global # iv. Horspool Algorithm Boyer Moore algorithm uses two gauges to know shift distance. Good suffix shift is quite complicate to implement so there was a need of a simplified algorithm using bad character measure [7]. This algorithm is a simplification of Boyer -Moore algorithm based on bad character shift. It has been produced by Nigel Horspool in the year 1980.The reason for this simplification is pattern is not always periodic. The concept used is when a bad character, reason for a mismatch is encountered; the shift decision is made by analyzing the characters towards the right of the text window. # a. Working Principle The process starts by a window on text string of size equal to pattern. The scan of elements goes through right to left inside the window whereas the window slides from left to right over the text. When a mismatch is countered for some wt[i] != wp[j], 0 ? (i, j) < m, wt ? text window character and wp ? pattern window character. Then the match of the right most character of the text window is looked in the pattern so that wt [m-1] = wt [i], where 0? i < m -1, when both found the characters are therefore aligned causing a window shift. Case I: Suppose the bad character does not exist in the pattern then shift the whole window of size pattern. Case II: There exists two matches of the bad character in the pattern then the rightmost character is preferred. # b) Hardware Based String Matching Algorithms i. Mishina Algorithm Mishina produced a string matching algorithm for vector processors in the year 1993.This algorithm is used by Hitachi's pipelined vector processor and Integrated vector processor. A vector processor also known as an array processor is a CPU which executes instructions in a single dimensional array of data items [20]. Meaning it can perform parallel computations on the elements of array. The current algorithm works in two phases: cutout and check. In the first phase, that is in cutout segment the text string is divided into autonomous serviceable substrings so that each substring can be tested for equality with respective pattern strings in a pipeline using array processors. In the next phase, as the name indicates check phase, a string matching algorithm is employed to perform pattern matching. Here Aho-Corasick algorithm is applied to all substrings drawn from the cutout part. This way of applying string matching is ten times faster than the scalar string matching using Aho-Corasick algorithm. ii. Sidhu's Algorithm for String Matching using Hardware Technology The algorithm is grounded on non-deterministic finite state machine (NFSM) for regular expression matching. In the field of computing, regular expression gives a concise meaning to "match" " " " " [21]. The pattern can match one or more text strings. A non-deterministic finite state machine or automaton is a state machine resembles a directed graph that exhibits different states represented by nodes and edges designate character or empty string. The algorithm works by generating regular expressions for every string and NFSM examines the input at a speed of one byte at a time. This approach needs a time of O(m), m symbolize pattern length. NFSMs are tough to implement and requires rebuilding every time a string is added making it complicated. # c) String Matching Based on FM-Index Despite of many algorithms presented on string matching, the present attempt to solve string matching problem uses FM-index technique that concatenates the attributes of suffix array and Burrows-Wheeler transformation [22]. To understand the working of this architecture the above concepts has to be acknowledged. The next segment of this section presents a detailed discussion of it. # i. 2D-LARPBS It represents two-dimensional LARPBS. The model has the system's buses arranged in a twodimensional set up that makes communication among buses more effective [23]. It can be use for the design of both exact and approximate string matching. The construction of these algorithms is based on Hamming distance. # ii. Hamming Distance Hamming distance measures the amount of inequality between two strings. It can be applied for error detection and correction. For any two strings it gives the number of the corresponding characters that are dissimilar. Using this measure the knowledge of closeness of two strings can be known and thus gives an idea about the operations to be done to obtain one string from another. # a. Formal Definition of Hamming Distance For strings A and B with same length k, the Hamming distance H (A, B) is given by H (A, B) = no. of positions where A[i]!=B[i] and 0