# Introduction esearchers feel enthusiastic on the sequential pattern mining problems and wide range of possibilities of applications regarding the envisaging of the customer buying patterns and scientific discoveries [1,2,3,4,5] discussed by Agrawal and Srikant [1]. Let us explain with an example like finding the given time stamped sequences of purchase made by a customer. In this example the main objective is to find sequence of same time stamped list of items purchased by the customer. So the algorithm of sequence pattern mining should concentrate on finding the repeated sequences which are called as frequent sequence. Such sequences list out the frequency of common occurrences. several heuristics like GSP [1], SPADE [3], Prefix Span [2] and the SPIRIT [4] attempt to find the frequent patterns in productive method by striving to cut short a series, hence decrease search space. The GSP algorithm [1] utilizes the anti-monotone property (all subsequences of a frequent sequence are also frequent). The SPADE finds frequent sequences using the lattice search [3] and intersection based approach. In this particular method the sequence database is converted into a vertical format. The candidate sequences will be made into different groups. These frequent sequences will be listed in SPADE utilizing two methods namely breadth first method and depth first method. The base is calculated for the produced sequences. The approach of mentioned three algorithms can be grouped as the candidate-production with a base evaluation. The PrefixSpan [2] algorithm adopts growth method pattern. It utilizes recorded database to accomplish.. Prefix is Projected Sequential Pattern mining which checks prefix subsequences and includes the postfix sub sequences into the databases. # II. # Related work The sequential item set mining problem was initiated by Agrawal and Srikant, and the same developed a filtered algorithm, GSP [1], based on the Apriori property [1]. Since then, lots of sequential item set mining algorithms are being developed to increase the efficiency. Some are SPADE [3], PrefixSpan [2], and SPAM [11]. SPADE [3] is on principle of vertical id-list format and it uses a lattice-theoretic method to decompose the search space into many tiny spaces, on the other hand PrefixSpan [2] implements a horizontal format dataset representation and mines the sequential item sets with the pattern-growth paradigm: grow a prefix item set to attain longer sequential item sets on building and scanning its database. The SPADE [3] and the PrefixSPan [2] highly perform GSP [1]. SPAM [11] is a recent algorithm used for mining lengthy sequential item sets and implements a vertical bitmap representation. Its observations reveal, SPAM [11] is more efficient in mining long item sets compared to SPADE [3] and PrefixSpan [2] but, it still takes more space than SPADE [3] and PrefixSpan [2]. Since the frequent closed item set mining [12], many capable frequent closed item set mining algorithms are introduced, like A-Close [12], CLOSET [13], CHARM [14], and CLOSET+ [15]. Many such algorithms are to maintain the ready mined frequent closed item sets to attain item set closure checking. To decrease the memory usage and search space for item set closure checking, two algorithms, TFP [17] and CLOSET+2, implement a compact 2-level hash indexed result-tree structure to keep the readily mined frequent closed item set candidates. Some pruning methods and item set closure verifying methods, initiated that can be extended for optimizing the mining of closed sequential item sets also. CloSpan is a new algorithm used for mining frequent closed sequences [16]. It goes by the candidate maintenance-and-test method: initially create a set of closed sequence candidates stored in a hash indexed result-tree structure and do post-pruning on it. It requires some pruning techniques such as Common Prefix and Backward Sub-Itemset pruning to prune the search space as CloSpan [16] requires maintaining the set of closed sequence candidates, it consumes much memory leading to heavy search space for item set closure checking when there are more frequent closed sequences. Because of which, it does not scale well the number of frequent closed sequences. # III. Sequence, sub sequence and frequent sequences We can say a sequence means an ordered set of events [1], set of events S is said to subsequence of 2 S . The sequence database S is a set of the form (tid, s) where tid is the transaction-id and s is the sequence generated from transaction. Let the minimum support as a threshold defined by user which indicates the desired minimum occurrences of the sequence to be claimed as frequent. A sequence s j is lengthiest if ? ? s ! {s S|i ={1..m}} j i . We explain the maximal support of an event e which consists of items m i i MSF s MS e = = ? The Maximal Support Factor is the threshold used in our proposal to minimize the subsequence search in top to bottom approach. We apply an ordered search on sequence database, hence the sequence database will be ordered in descending manner by MSF of sequences. Then the search for super sequence of a given sequence is limited to the sequences with greater or equal MSF of the given sequence. The preprocessed dataset with the sorted transaction list has several properties that can be used to cut short the search space which are hypothesized below: Ossm overview T is a transaction represented by transaction id tid and s I is an itemset that belongs to set I of frequent itemsets. ? Build sequences by grouping items as events in a given transaction. Here we follow the top-to-bottom approach to build events. First we build events based on maximal length itemsets and continue the process in descending order of the itemset lengths. | | max ( ) { | ( ) 1 0} 1 SM Occurrence s i if s s i or i c c i i = ? = = ? = ? Here | | SM is set of ? Measure the Maximal support of each event e of the given transaction tid T (refer section 3 for measuring maximal support MS for an event e of transaction tid T ). ? Measure the Maximal Support Factor MSF of each sequence tid S (Refer section 3 for measuring Maximal Support Factor MSF of sequence tid S ) ? Order the sequences in sequence dataset S in descending manner of their MSF. ? Build weighted acyclic directed graph from frequent itemsets of length two. o Elements of the itemset considered as vertices o Support of that itemset considered as edge weight ? Apply WFI algorithm to find critical path between any two items which represents the maximal sequence between two elements opted. ? Apply all four properties that are hypothesized in section 3 to discard, prune sequences or select maximal length sequences. V. OSSM: Top-Down Ordered Sequence Set Mining for maximal length sequences a) Concurrent Edge Prevision and Rear Edge Pruning (CEG&REP) [7] in OSSM i. # Preprocess Dataset preprocessing and itemsets Database initialization is performed by us as the first stage of proposal. As we find itemsets with single element, we in parallel prune it with the itemsets of single elements if the support of the selected itemsets is less than the required support. ii. # Concurrent Edge Prevision In this phase, we select all itemsets from given itemset database as input in parallel. Then we start projecting edges from each selected itemset to all possible elements. The first iteration includes the pruning process in parallel, from the second iteration onwards this pruning is not required, which we claimed as an efficient process compared to other similar techniques like BIDE [8]. In first iteration, we project an itemset p s that spawned from selected itemset i s from S DB and an element i e considered from 'I'. iii. # Rear Edge Pruning If any of ' ( ) ( ) ts p ts p f s f s ? that edge will be pruned and all disjoint graphs except p s will be considered as closed sequence and moves it into S DB and remove all disjoint graphs from memory. The termination of above process do not take place till the graph becomes empty, i.e. till the elements which are connected through transitive edges and projecting itemsets are available in the memory. # b) Building Sequence Set from Transaction Dataset Here in this section we explore the process of building sequence dataset. TD is the given transaction dataset I is the set of closed frequent itemsets of length 1 to m . Here m is the maximal length of the itemset. ? Initially set of closed frequent itemsets is ordered in descending manner by itemset length. ? For each transaction tid T in the given transaction dataset TD o Build events based on the closed frequent itemsets of the set I such that the event lengths will be decided in the descending order of the frequent itemset length. o Initially events with length of m determined that is maximal length of the frequent itemsets. Then As a part of the OSSM, we order the sequence dataset in the descending manner of the Maximal support Factor (refer section 3 for details). Let S be the determined sequence dataset from the given transaction dataset TD and set of frequent itemsets I . In this phase of OSSM, we explore the building of a weighted acyclic directed graph. Let 2 I be the set of frequent itemsets of length 2. Let G be the graph initially with vertex count of zero | | 0 V = and edge count of zero | | 0 E = . Here V is vertex set and E is edge set. In this phase we apply WFI algorithm [9,10]. In the first pass of algorithm we try to identify and evaluate potential long and rich candidates. The rich sequences are the one whose constituent 2-sequences have high support. In the directed graph, the 2-sequenece frequencies are represented by the edge weights; we can easily compute the path with the highest weights between all pairs of nodes. Here we use WFI algorithm [9,10] for the purpose finding critical path(a path with maximal vertex count) with maximal weights. # f) Sequence Evolution Each critical path generated from the graph G will be considered as candidate sequence and stored in candidate sequence set css g) Verifying Frequency of Candidate Sequence ![order sequences in descending manner for each sequence MSF is greater or equal to Hypothesis 4: Let c s be sequence such that maxOccurrence( ) c s ms < then discard all supersets of c s .Since c s is infrequent then its super sequences also infrequent. Global Journal of Computer Science and Technology Volume XII Issue VII Version I April IV.](image-2.png "?") 1..sequence 1 S is said to be a subsequence of anothersequenceS2ifandonlyif1 s i?s2e ifor = i {1, 2,3, .m | ?1 e e e 2 3 < <......