# Introduction n Architecture is defined as building for humans, and being an architect is having the spirit to build for humans. A framework is a collection of classes and applications, libraries of SDKs and APIs to help the different components all work together. In engineering discipline an essential part of quality is control of change. That dictates the need to review and understand changes prior to accept them. Models and Diagrams are a primary design artifacts in this environment, this means being able to compare diagrams to identify correspondence and discrepancies between them. In large-scale IT system development techniques have long existed for comparing textual artifacts, somewhat less work has been reported concerning comparisons of the diagrams and model that are common. The main problem of this paper is to correspondence between a pair of diagrams (a mapping between elements of one diagram and elements of the other) and introduce a Bayesian approach to solve the problem. The application which are in the central to modern IT systems development process includes structured representation of requirements, business process workflows, system overviews, architectural specifications of systems, network topologies, object designs, state transition diagrams, and control and data flow representation of code. # a) Scenarios The system development life cycle has several application to find correspondence between models. A series of successive revisions of a model from design activity. There is a need to review and understand the nature of revisions as part of accepting them, rejecting them or merging them with other concurrent revisions and to identify correspondences and discrepancies is central to such activities. Model variants correspond is crucial for integration. Different collaboration may experiment with different paths of evolution of a model, resulting in a number of transient variants, with the intent that those branches deemed successful will be integrated back into a main stream. The use of multiple views of the architecture of the system by using many development approaches and methodologies [6]. The model we propose is made up of five main views [7]. ? The logical view, which is the object model of the design (when an object-oriented design method is used), ? The process view, which captures the concurrency and synchronization aspects of the design, A ? The physical view, which describes the mapping (s) of the software onto the hardware and reflects its distributed aspect, ? The development view, which describes the static organization of the software in its development environment. # Traceability is the another important requirement for maintaining quality [10], [5]. Traceability between software artifact, such as requirements, design elements, code, test cases and defect reports. At finer level of granularity, traceability provides the ability to navigate between the elements of different artifacts such as individual software components, hardware nodes, requirements, non-functional requirements, and architectural decisions that reflects the design rationale for the system. The larger asset of reuse is the incorporation of reference architecture from a repository into a solution design. # b) Contribution of the Paper In Present days, determining correspondences between models is a tedious, error-prone, timeconsuming, manual process. The main goal is to achieve an automated means of determining the correspondences, similar to techniques for automated comparison of textual artifacts. This requires us to answer several questions: ? How do we represent models? ? Which features of models must be represented? ? What algorithms should be used to find correspondences? In this paper, provide answers to these questions. # II. # Diagram features We focus mainly on the problem of finding correspondences in the domain of IT architecture operational models [2], although the paper techniques have proven effective for other kinds of IT architecture models as well. Operational models are used by IBM Global Services architects as part of a development methodology for customized IT solutions. An operational model also includes model elements reflecting the key decisions constituting the rationale for the solution design. The main features of an operational model diagram can be abstracted to elements found in many other kinds of diagrams: [8] Nodes are grouped together semantically. For instance, in operational models, servers located in the same building may be grouped within a common region. Like nodes, groups have labels and relationship. For example, regions have an adjacency relationship that indicates a connection. Regions are discussed in greater detail below. The information represented by system diagrams can be broadly classified into three types: 1) syntactic information (e.g., nodes, labels, containment, and edges), 2) semantic information (e.g., types, defined semantic attributes), and 3) visual information (e.g., position, shape, and color of diagram elements). Leveraging all of these kinds of information is one of the major challenges of diagram matching. # III. # Model correspondence problem The model correspondence problem is the problem of finding the "best" correspondence between the elements of two diagrams. # a) Semantics and Domain-Specific Knowledge as a Basis The first issue is how to define "best." It may seem appealing to define "best" as the correspondence that preserves a specific semantic relationship between the two diagrams, but this definition would be difficult to apply in practice, for several reasons. First, there are many possible semantic relationships between diagrams and it is hard to decide which applies. For example, in one case, we may have a diagram pair(??, ?? ? ), where ?? ? is a revision of ??, with the semantic relation "is a revision of." In another case, ?? may be a conceptual description of a system and ?? ? a physical description, with the semantic relation "realizes." Second, even if the semantic relationship is known, defining it in precise detail would be difficult, and even a precise definition may not have sufficient information to find the best correspondence. Third, many diagrams found in practice have no formal semantics: They use informal notions of "boxes" and "lines" to convey context-specific architectural notions. Either way, we conjecture that generic matching techniques can go a long way in finding ? Evidence takes the form of having similar or dissimilar features. For example, if two nodes have the same label, this is strong evidence that they match. If two nodes are at totally different positions in their respective diagrams, that is evidence that they do not match. ? For a node pair (n, n') sometimes there is some evidence that n and n' match and other evidence that n and n' do not match. Practitioners will use their experience to weigh the relative significance of the different pieces of evidence and decide whether or not n and n' match. ? The correspondence can be filled in by identifying one-to-one matches using evidence about node pairs. Other kinds of evidence help suggest nonone-to-one matches when necessary. For example, if diagram D has a node n labeled "Firewall and Access Control" and D' has node n'1 labeled "Firewall" and n'2 labeled "Access Control," the labels suggest that n matches to both n'1 and n'2. If n'1 and n'2 are both within the same container, this is further evidence that they may match to the same node in D. IV. # Solution overview An overview of our solution, and it serves as a road map to Bayesian correspondence, which gives the mathematical and gives a mathematical description an algorithm. Our algorithm as Automated Matching of Models (AMMO). We explain the main ideas of the AMMO algorithm by tracing its behavior on a simple example diagram pair, HLV and LLV, as shown in Fig. 2. This diagram pair is highly simplified for presentation purposes, but it does exhibit some of the difficulties found in production models, such as non-obvious node matches and matches that are not one-to-one The tags B1; B2; . . . ; Q1; Q2; . . .; L1; L2; . . . ; W1; W2; . . . are only for ease of reference in this discussion and are not part of the actual node labels. Also, note that regions, such as "L2: Application Zone," contain nodes, such as "B2: Application Services" and Our algorithm begins by computing a number of similarity values for each possible node pair consisting of a node from one diagram and a node from the other diagram, i.e., (??, ?? ? ) ? ?????? × ??????. A similarity value is computed for each feature from a predetermined set of features. For example, nodes with similar labels often match, so one of the features we work with is the textual label of a node, and one of the similarities we compute for a node pair is its label similarity-a value between 0 and 1 reflecting the string similarity between the node labels. A similarity value can be regarded as a "raw similarity score" for a particular feature for a node pair. A similarity value in itself does not indicate whether pair of nodes match; that is, it is unclear whether a particular similarity is low or high with respect to the population. To transform a raw score consisting of a feature similarity value into a probability that a pair of nodes match. Given a probability distribution of the similarity values, based on similarities observed for matching and non-matching pairs in training data, Bayesian inference will convert the similarity of (??, ?? ? ) into the probability that (??, ?? ? ) match. From Table 1 to Table 2 the probabilities resulting from Bayesian inference given the similarities. One can see that the probability of node B4 matching to Q6 is much higher than the probability of B4 matching to any other node. One can also see that B2 is approximately twice as likely to match to Q1 as it is to match to any other node. Finally, one can see that the probabilities of B1 and B3 matching to any of the nodes in the second diagram are approximately equal, indicating that the label feature is inadequate in determining matches for these nodes. For some nodes, such as B1, label similarity does not help much in finding a match. In general, one evidencer is not usually enough to find the best match for a node. Thus, AMMO algorithm employs several evidencers. For example, it is noted previously that B2 appeared to correspond to Q1 based on label probabilities. However, a human expert would know intuitively that B2 should correspond to Q4, because both appear to be in similar positions in the two diagrams. For multiple evidencers need a mechanism for combining one kind of evidence with another. AMMO combines evidence using Bayesian inference on a joint probability distribution over all of the kinds of evidence. The results of combining the label and position evidence. Note that B2 now matches to Q4 with probability five times greater than any other node. Note as well that the possibilities concerning matches for other nodes have been narrowed down considerably. The evidencers combining obtained by both the label and position evidencers yielded a very probable match for B2. Beyond this, there is additional evidence that makes this match even more probable. B4, which is a neighbor of B2, matches Q6, which is a neighbor of Q4-having matching neighbors is additional evidence that B2 matches Q4. Our implementation includes a "connection evidencer" that provides such evidence. Evidencers such as the label or position evidencers simple evidencers because they use only information about the given pair of nodes. In contrast, call evidencers like the connection evidencer complex evidencers because they use more than just information about a given pair of nodes to compute the similarity for that pair of nodes-they also use information about other pairs of nodes (in this case: neighboring nodes) that have already been determined to match. (one-to-many matches) and merges (many-to-one matches) are common in practice. Experts identify splits and merges by combining several pieces of evidence. For example, an expert might note the following characteristics of HLV and LLV: o C1 is close in position to each of P1, P2, and P3. o P1, P2, and P3 are interconnected. o The combination of P1, P2, and P3 taken together has connections to P4 and to P5, and these connections appear to match the connections from C1 to C2 and to C3. These characteristics, when taken together, indicate that C1 is likely to have split into P1, P2, and P3, i.e., that C1 matches P1, P2, and P3. # f) Drops It is also possible that a node in one diagram does not match any node in the other diagram. The probability that a node is dropped as the drop probability, denoted as P_DROP. This probability is determined empirically based on training data. # g) Correspondence Score The entire correspondence between the two diagrams from individuals matches between nodes. A Naïve approach to find "best" correspondence between two diagrams would be to include the node pairs with the highest pair probabilities. Table 5 below shows the results of combining all evidence about pairs for above example. If only to consider the pair probabilities shown in Table 5 determine the "best" correspondence to be Corr1 = {(B1, Q2), (B2, Q4), (B3, Q5), (B4, Q6)}. However, this approach fails to yield the optimal correspondence for several reasons. First, although it might result in dropped nodes (when none of the chosen pairs involve a given node), it does not take into consideration the probability of those drops. For example, correspondence Corr1 does not include a match for Q1, and thus, Q1 is a dropped node (as we have defined that above). However, if the probability of a node being dropped is extremely low, it might have been better for Corr1 to include a split (as that was defined above) involving Q1, resulting in a correspondence which is more likely overall. Second, although it might result in splits and merges (when more than one of the chosen pairs involves a given node), this approach does not take into account the probability of these splits and merges. Third, greedily choosing the best pairs, one after the other, does not take into account the fact that choosing a particular pair match can raise or lower the probability of other pair matches, due to complex evidencers such as the connection evidencer. # h) Complexity A correspondence using only simple node pair evidencers such as label and position, and restrict ourselves to correspondences in which all node matches are one-to-one, then need to find the maximum score correspondence using a polynomial-time algorithm based on maximum-weight bipartite matching. Using complex evidencers and allowing correspondences that are not one-to-one, the problem of identifying the maximum score correspondence is NP-hard. V. # Bayesian correspondence model a) Correspondences and Matches Let ?? and ?? ? be diagrams whose nodes are sets ?? and ?? ? , respectively. Our core notion is the diagram correspondence, which equates sets of nodes in ?? with sets of nodes in ?? ? , but also allows nodes to be left out. Formally, Q is a ?????????????? ?????????????????? of a set ?? iff ?? = {??1, ??2, . . }, where each ?? ?? ? ?? and ?? ?? ? ?? ?? = ? ; for all ?? ? ??. A diagram correspondence for nodes ?? and ?? ? of two diagrams is a tuple ?? = (??, ?? ? , ð??"ð??"), where ?? is a partial partition of ??; ?? ? is a partial partition of ?? ? ; ð??"ð??" ? ?? ? ?? ? is one to one: b) Evidencers Evidencers provide the basis for determining the probability that a pair of nodes match, based on one kind of evidence. Informally, an evidencer consists of three parts: 1) a definition of a node feature (e.g., a node's label), 2) a function that measures the similarity of two nodes based on that feature, and 3) a probability distribution of node pair similarity values in cases where the two nodes match, and a probability distribution of node pair similarity values in cases where the two nodes do not match. Formally, an evidencer consists of a similarity function ?? ?? and probability functions ?? ?? and ?? ?? . The similarity function is a function? ? (?, ? ? ), where (?, ? ? ) is a node pair from (?, ? ? ), where ? ? is a diagram derived from E by an unspecified procedure ?. We model ? by asserting that ? ? (?, ? ? ) is a random variable. The range of ? ? is arbitrary: The set of values used to measure similarity can be chosen to suit the evidencer. For example, the label evidence similarity function ? ? ?, ? ? = ???????(????? ? , ????? ? ) returns a real number in the interval [0,1] (??????? is a function # c) Correspondence Probability In order to use the evidence to find the best correspondence, model the best correspondence as a random variable ? that can take any diagram correspondence as its value. Estimation of the best correspondence is the one that has the highest probability given in the evidence. ? = arg max ? ?(?|?). # d) Singular Correspondence Probability Model The singular correspondence probability model defines the probability of a singular correspondence conditional on the observed evidence. Let (?, ? ? , ð??") be a singular correspondence for diagrams containing nodes n and n0. We will use ?(?) to refer to the set of nodes in the partial partition ?.We use the notation (?, ?) to mean that the node ? in the first diagram does not match any node in the second diagram, and similarly for (?, ? ? ). Then, ????? ? ? ?, ð??" ? ? ? ? ? ? ?, ? ? ? ?\? ? ? ?, ? ? |? ? ? ? ? \? ? Conditional independence allows us to define the correspondence probability as the product of the probability of the pairs: ? ? ? = ?( ?, ? ? |? ?, ? ? ) (?,? ? )?????? (?) One-to-none match probability. We assume simply that a node maps to nothing with fixed probability ? ?, ? = ? ?, ? = ? 0 . Choose the numerical value of ? 0 based on the empirical frequency of one-to-none pairs observed in training data. It may improve accuracy to develop a model of the probability that n maps to nothing based on the features of ?. However, in this paper have not implemented such models. One-to-one match probability model. By adopting a Bayesian model of the probability that one node matches another conditional on the evidence: ? ?, ? ? |?(?, ? ? ) = ? ?, ? ? ?(?(?, ? ? )| ?, ? ? ) ?(? ?, ? ? ) Because ?, ? ? and ?, ? ? are mutually exclusive events and exhaustive of the space of all possible outcomes with respect to (?, ? ? ), the denominator can be rewritten using a standard normalization technique to get: The factor ? ?, ? ? is referred to as a prior. ? ? and ? ? the prior by decomposing the match event into simpler events, and then, applying commonly used principles of prior selection. First, In this paper notice that the event ?, ? ? decomposes into two events: ?, the event that ? matches to some node (i.e., ? is not dropped), and ?, the event that ? matches specifically to ? ? . Thus, ? ?, ? ? = ? ? ?(?|?). For ?(?), we use a simple empirical prior: ? ? ? 1 ? ? 0 , where ? 0 is the Probability that a node is dropped, as observed in training. For ?(?|?), we use an indifference prior: Knowing only that ? matches to some node in ? ? , we assume that all nodes are equally likely, so ? ? ? = 1/|? ? |. This gives us our complete prior: ? ?, ? ? = (1 ? ? 0 )/|? ? |. ? ?, ? ? |?(?, ? ? ) = ? ?, ? ? ?(?(?, ? ? )| ?, ? ? ) ? ?, ? ? ? ? ? ?, ? ? ?, ? ? + ?( ?, ? ? ) ? ?(?(?, ? ? )| ?, ? ? ) # e) Split-Merge Correspondence Probability Model The split-merge correspondence probability model is like the singular correspondence probability model, except that paper deal with pairs of sets of nodes rather than pairs of individual nodes decompose a split-merge correspondence ? = (?, ? ? , ð??") into set pairs as follows: ?????? ? ? ?, ð??" ? |? ? ? ? ? , ? |? ? ?\?(?) ? ?, {? ? } |? ? ? ? ? \?(? ? ) , One-to-many match probability model. For the one-to many case can use a Bayesian model similar to that for the one-to-one case: Several issues in computing the factor ?(? ? (?, ? 1 ? , ? 2 ? )| ?, ? 1 ? , ? 2 ? ), which is the probability according to one kind of evidence (? ? ) that the node ? matches the set consisting of nodes ? 1 ? and ? 2 ? , i.e., that "? splits into ? 1 ? and ? 2 ? ", or conversely that "? 1 ? and ? 2 ? merge into ?." One way that we address these two issues is to define a new kind of evidence based on the evidence about the merge node matching each of the split nodes individually. That is, consider evidence about pairs of nodes, each pair consisting of the merge node and one of the split nodes. It define ? ? ? ?, ? 1 ? , ? 2 ? , ? , ? ? ? ?, ? 1 ? , ? 2 ? , ? , ? ? ? ? ? ? ? ?, ? ? ? ?, ? ? ? , Where ? = ??ð??" min ?=1?? (? ? ?, ? ? ? ), This define the prior for the one-to-many case as follows: We notice that the event ?, ? 1 ? , ? , ? ? ? decomposes into two events: ?, the event that ? matches a set of ? nodes, and the event ? that ? matches specifically to ? 1 ? , ? , ? ? ? . Thus, ? ?, ? 1 ? , ? , ? ? ? = ? ? |?(?|?). For ?(?), we use the fixed empirical prior, ? ? , the observed probability that a node ? will match exactly ? nodes. For ?(?|?), we use an indifference prior: Knowing only that n matches to a set of ? nodes in ? ? , we assume that any of the ? nodes is equally likely. This yield: ? ?, ? 1 ? , ? , ? ? ? = ? ? |? ? | ? . f) The Maximization Problem The previous sections showed how to compute ?(?|?) for a given correspondence ? and evidence ?. To complete the algorithm, one should describe how to find the ? with maximal ?(?|?). Computing the score of such correspondences using only simple evidencers can be done in polynomial time (ideally constant time per node pair, quadratic overall). To find the maximum probability correspondence in this case, construct a graph which has as its nodes the union of the nodes in the two diagrams, ? ? ? ? . Place an edge from every node ? in ? to every node ? ? in ? ? with edge weight ? ?, ? ? = ?( ?, ? ? |? ?, ? ? ). Now find the maximum probability correspondence in polynomial time using maximumweight bipartite matching [4]. i. # Greedy Search The simplest search algorithm is greedy search. In greedy search, we keep track of only one piece of information, the current state. On each step, we examine all states reachable by a single transition from the current state, and move to the state with the greatest probability. And there is no backtracking-In this paper, only consider transitions that add a node pair to the correspondence, not those that remove a pair. If there is no next state with greater probability than the current state, the search stops. Fig. 3 : Greedy Search Fig. 3 gives a high-level description of the greedy search algorithm for our problem. We assume that, before this algorithm is called, for any nodes ? and ? ? in the two diagrams, we have already computed ?( ?, ? ? |? ?, ? ? ), the probability that they match, based upon the various simple evidencers. ii. # Complexity Analysis Let's assume that the total number of nodes in the diagrams is ?(?). Then, the naive implementation of the greedy search algorithm has complexity ?(? 4 ). The outer while loop will be executed at most ?(?) times since each iteration removes at least one node of the diagrams from future consideration. The for loop is executed ?(? 2 ), as there are at most ? 2 pairs to consider. the naïve implementation, computing Score(newCorr) at line "*" costs ?(?) time due to the connection evidencer, which requires ? ?, ? ? |?(?, ? ? ) to be recomputed for each pair hm;m0i in the correspondence. The connection evidencer will return different values for the probability of ?, ? ? . Hence, the total complexity of the algorithm is ?(? 4 ). # Incremental Algorithm It is possible to implement the Score() function so that it takes time proportional to the number of neighbors of the added nodes-probability is only recomputed for pairs that might possibly be affected by a newly added pair. Assuming bounded degree graphs, this incremental version takes complexity ?(? 3 ). It Describes the set of evidencers that was designed and implemented as part of prototype implementation of the AMMO algorithm. The prototype evidencers can calculate a) Simple Evidencer Label Evidencer measures the similarity between text labels of a node pair. Python standard library function difflib.Sequence-Matcher.ratio(). Region Evidencer, A region may have a name, a set of neighboring regions, and a set of nodes that are located within it. Type Evidencer. Some diagrams have nodes typed as being hardware components or infrastructure software components or application software components (or EJBs or ManagedComponents), while other diagrams have nodes typed as being actors or information flows or use cases or systems. Position Evidencer Similarity values returned by position evidencer and expect the euclidean distance between matching nodes to be small. # b) Complex Evidencer A complex evidencer to be an evidence which requires information from more than just the node pair for which it is finding a similarity value. In addition to that node pair, it also takes as input a partial correspondence between the two diagrams. Connection evidencer. The Connection evidencer is based on the connections, or edges, that each node has to its immediate neighbors. Fig. 3 illustrates connection similarity computation for the pair (B2, Q4) in our sample diagram pair. In this figure, the solid curved line indicates that at this point in the search, the match ?4, ?6 is already part of the correspondence. The dotted curved line indicates that we are considering the node pair (B2, Q4). By virtue of the facts that B2 has two neighbors (B1 and B4), Q4 has two neighbors (Q3 and Q6), and one of B2's two neighbors (B4) matches one of Q4's two neighbors (Q6), as indicated by the dashed line, the connection similarity for (B2, Q4) is ??ð??" Ultimately, connections turn out to be strong evidence that B2 and Q4 match. c) Split Evidencer A Split-Merge Model which defined the probability of a split-merge correspondence conditional on the observed evidence. Recall that a split-merge correspondence is one containing split-merge matches-matches between one node and a set of nodes. Further, recall that, to evaluate the probability of such correspondences, two types of evidencers are used: simple (pair) evidencers and split evidencers. The simple evidencers that were implemented as part of our prototype, and this section describes the split evidencers of our prototype. It uses the probabilities of the pairs to determine the order in which pairs should be added to the correspondence. This is done as follows: As in the case of AMMO, the first thing that the algorithm does is to precompute probabilities of all possible node pairs, using the simple evidencers. It then creates a sorted list Potential Pairs, which contains the node pairs sorted in descending order by probability. The main loop of AMMO-LITE goes through Potential-Pairs, adding the highest probability pair (the one at the head of the list) to the correspondence, provided that it is permissible to add that pair. It is not permissible to add a pair. It is not permissible to add a The motivation for creating special-purpose split evidencers arose out of the observation that split-merge correspondences exhibited different characteristics than singular correspondences and that these characteristics were not taken into account by the simple evidencers. Label Sim evidencer. The similarity determined by the Label Sim evidencer is the minimum similarity among the labels of the nodes. Label Intersect evidencer. The similarity determined by the Label Intersect evidencer is the similarity between the label of ? and the longest suffix or prefix commonto the labels of the ? ? ? nodes. Label Concat evidencer. The Label Concat Evidencer similarity function uses the Label Evidencer similarity function to obtain the similarity between the label of ? and the concatenation of the labels of the ? ? ? nodes. Inner Connect evidencer. This is a discrete measure of similarity based on whether or not all of the ? ? ? nodes are connected to each other. Outer Connect evidencer. This is a continuous measure of connection similarity between ? and the cluster of ? ? ? nodes taken as a whole. Although the greedy search algorithm described performed well for diagrams with dozens of nodes, it was not practical for diagrams with hundreds of nodes. the major scalability problem with AMMO is that every time it has to decide which node pair to add next, it must compute an exact probability for each possible correspondence that would result from adding one more node pair. Our incremental version of greedy search helps avoid some of this recomputation, but not enough to be practical for larger-scale diagrams. To solve this problem, we designed a new algorithm, AMMO-LITE, which approximates AMMO's behavior but uses a simpler search that is driven by pair probabilities rather than correspondence probabilities. This approach avoids repeated calculation of correspondence probabilities and, in practice, achieves much better performance with only a small loss of precision. The values in the table were obtained by averaging the Recall, Precision, and Time metrics for each algorithm across all of our model pairs. # Global The AMMO-LITE algorithm did not do quite as well as AMMO in terms of both Recall and Precision, but it still did significantly better than the non-Bayesian approach. Furthermore, if one examines the cases where AMMO-LITE did poorly in comparison to AMMO, most of these cases involved complex correspondences with a number of challenging matches and multiple split/merges. VIII. # Results # Conclusion We have identified and described the model correspondence problem, an important problem in software engineering. We have designed a Bayesian framework that supports the reasoning needed to solve the model correspondence problem. And we have implemented and tested a matching algorithm based on our framework, finding that it achieved high accuracy on a set of test diagram pairs. We believe that this work holds great promise for the future. 1![Fig. 1: The 4+1 View Model](image-2.png "Fig. 1 :") 2![Fig. 2 : Simple Example Diagram Pair](image-3.png "Fig. 2 :") ![Search Services." Further note that the label of a node is not qualified by the label of the region that contains it. a) Feature Similarity](image-4.png "") ![Journal of Computer Science and Technology Volume XII Issue XI Version I from the other diagram. Another possibility is that a node from one diagram matches a combination of nodes from the other diagram. Splits](image-5.png "Global") ![2012 Global Journals Inc. (US) Global Journal of Computer Science and Technology Volume XII Issue XI Version I](image-6.png "©") ![similarity value for two strings: our prototype used a function implemented in the Python standard libraries).](image-7.png "") 1Q1Q2Q3Q4Q5Q6B1 0.118 0.125 0.083 0.211 0.316 0.095B2 0.385 0.240 0.061 0.143 0.143 0.200B3 0.190 0.200 0.286 0.348 0.174 0.240B4 0.316 0.111 0.231 0.095 0.095 0.435b) Match Probability from Feature Similarity 2SimilarityQ1Q2Q3Q4Q5Q6B1 0.100 0.100 0.104 0.108 0.155 0.102 B2 0.225 0.116 0.108 0.108 0.100 0.105 B3 0.104 0.105 0.135 0.182 0.102 0.116 B4 0.155 0.101 0.113 0.102 0.102 0.308 c) Multiple Evidencer 3Q1Q2Q3Q4Q5Q6B1 0.728 0.844 0.255 0.003 0.009 0.000B2 0.010 0.001 0.313 0.913 0.022 0.407B3 0.002 0.015 0.275 0.022 0.917 0.238B4 0.000 0.000 0.087 0.659 0.033 0.741d) Simple Evidencer and Complex Evidencer 4SimilarityQ1Q2Q3Q4Q5Q6B1 0.230 0.375 0.038 0.000 0.002 0.000B2 0.003 0.000 0.053 0.561 0.003 0.075B3 0.000 0.002 0.056 0.005 0.555 0.039B4 0.000 0.000 0.012 0.181 0.000.560e) Splits and MergesHLV has four nodes and LLV has six, clearly not every node of LLV can participate in a one-to-one match. It is possible that a node from one diagram 36 : Pair-wise Match Probabilities based on allevidenceP1P2P3P4P5P6C1 104.4 189.6 5.371 0.000 0.001 0.000C2 0.415 0.019 30.82 787.0 2.787 0.037C3 0.082 0.642 79.93 5.572 783.0 0.019C4 0.000 0.000 1.021 0.100 0.002 786.3 ? ?, ? ? |?(?, ? ? ) =?(1) ? 1 + ?(0) ,Where? 1 = ? ?, ? ?? ? ? ?, ? ? ?, ? ??? 0 = ? ?, ? ?? ? ? ?, ? ? ?, ? ??Thefactors? ? ? ?, 8PairNodesEdgesAMMO-AMMO RatioLITEPair 19130.832.743.3Pair 212120.963.683.8Pair 315242.6312.194.6Pair 422415.1141.498.1Pair 5353111.3098.668.7Pair 6416815.51257.2716.6Pair 76379686175.00-- © 2012 Global Journals Inc. (US) © 2012 Global Journals Inc. (US) pair to the correspondence if that would result in a many-to-many match. Each time a new pair ???, ??? is added to the correspondence, the algorithm goes through the list again, in order to determine if the precomputed probability of any remaining pair ?????, ????? has been affected. The probability of pair ?????, ????? in PotentialPairs will be affected in two different circumstances: ? If ???? or ???? is one of the nodes in the pair we just added, then adding ?????, ????? would result in a split/merge. Thus, we change the precomputed probability stored for ?????, ????? to be the probability of the split/merge that would result from adding ?????, ????? to the correspondence. ? If ???? and ???? are neighbors of ?? and ??, respectively, then adding ???, ??? to the correspondence will affect the similarity of ?????, ?????. Thus, ??(?????, ?????) must be recomputed, this time using the connection evidencer as well as the simple evidencers. After going through PotentialPairs, if any probabilities have been recomputed, the list is resorted. The algorithm then continues with another iteration of the main loop to add another pair to the correspondence. The algorithm terminates when either the list is empty or the probability of the pair at the head of the list is less than some threshold value. This value is determined by experimentation with training data, and can be easily changed. In our implementation, this threshold is ?? 0 , the empirically determined probability that a node does not correspond to any node in the other diagram. ## b) Complexity Analysis Let the total number of nodes in a diagram be ??(??), as in the analysis of AMMO. Depending on the value of threshold, the outer while loop could be executed ??(?? 2 ) times, once for every possible node pair. However, the outer if statement (immediately within the while loop) will only be true ??(??) times since each pair added must add at least one new node to the correspondence, due to the many-to-many restriction, and hence, add at most ??(??) pairs. Thus, the nested for loop will be reached on only ??(??) iterations of the while loop. Each time the for loop is reached, it will execute ??(|????????????????????????????|) = ??(?? 2 ) iterations. The resulting total complexity is ??(?? 3 ). Similarly, like the nested for loop, the statement resort(PotentialPairs) will be reached at most ??(??) times. Sorting being ??(??????????), each sort of the ??(?? 2 ) items in PotentialPairs will have complexity ??(?? 2 ????????). Thus, the resulting total complexity of the algorithm due to all sorting is ??(?? 3 ????????). That dominates the ??(?? 3 ) of the nested for loop, and therefore, the overall worst-case total complexity of the AMMO-LITE algorithm is ??(?? 3 ????????). To see why AMMO-LITE performs better than AMMO in practice, consider the following: In AMMO-LITE, each timewe add a pair ???, ??? and make a pass through the list PotentialPairs. Although this list can be ??(?? 2 ), it is a "quick" pass over the list-most of the pairs are just skipped. "Real" computation only takes place if ?????, ????? meets certain criteria in which case, we recompute its associated probability. So, in practice, our performance is better than ??(?? 3 ????????) would suggest. In fact, employing a priority queue along with an incremental approach to updating pair probabilities, and assuming a bounded-degree graph, we could achieve an overall total complexity of ??(?? 2 ????????) as follows: This can implement PotentialPairs as a priority queue in which pairs are ordered according to their probability, there by obviating the need for separate explicit sorts. Initially, we construct PotentialPairs by inserting all of the ??(?? 2 ) pairs into it. With a priority queue implementation for which insert, get_max, and delete are ??(????????), the complexity of constructing PotentialPairs is ??(?? 2 ????????). In that way, we avoid having to reexamine all of the ??(?? 2 ) remaining pairs in PotentialPairs. Assuming bounded-degree graphs, with the number of neighbors of a pair ???, ??? being bounded by a constant ??, the number of pairs whose probability must be recomputed due to connectivity is ??. Whenever we recomputed the probability of a pair and delete it from Potential-Pairs and reinsert it with its new probability (or we could simply do a change_priority operation). With delete and insert being ??(????????), the total complexity due to recomputing probabilities of neighbors of all of the ??(??) added pairs is ??(?? * ?? * ????????) = ??(??????????). Similarly, when a pair ???, ??? is added to the correspondence, the number of pairs ?????. ????? whose probability must be recomputed because they would now result in splits or merges is at most ??(??) because there are at most ??(??) pairs for which ?? = ???? or ?? = ????. Hence, the total complexity due to recomputing probabilities due to split/ merge considerations is ??(?? * ?? * ????????) = ??(?? 2 ????????). Thus, the overall total complexity of the algorithm would be ??(?? 2 ????????). ## c) AMMO-LITE Experiments * Large-Scale Software Architecture RGarland Anthony 2003 John Wiley and Sons * A Standard for Architecture Description RYoungs DRedmond-Pyle PSpaas EKahan IBM Systems J 38 1 1999 * Umldiff: An Algorithm for Object-Oriented Design Differencing ZXing EStroulia Proc. 20th 20th * IEEE/ACM Int'l Conf. Automated Software Eng 2005 * DETarjan Data Structures and Network Algorithms. SIAM Nov. 1983 * Towards a Reference Model for Requirements Traceability BRamesh MJarke IEEE Trans. Software Eng 27 1 Jan. 2001 * Model Integration with Model Weaving: A Case Study in System Architecture MD DJossic J.-PFabro JLerat FBezivin Jouault Proc. Int'l Conf. Systems Eng. and Modeling Int'l Conf. Systems Eng. and Modeling 2007 * Architectural Blueprints-The 4 + 1 View Model of Software Architecture PKruchten IEEE Software 12 6 Nov. 1995 * A Bayesian Approach to Diagram Matching with Application to Architectural Models DMandelin DKimelman DMYellin Proc. 28th Int'l Conf. Software Eng 28th Int'l Conf. Software Eng May 2006 * Software Systems Architecture: Working with Stakeholders Using Viewpoints and Perspectives NRozanski EWoods 2005 Addison-Wesley * Maintaining Traceability Links during Object-Oriented Software Evolution GAntoniol GCanfora GCasazza ADLucia Software-Practice and Experience 31 2001 * Handbook on Architectures of Information Systems 2006 Springer * IBM Insurance Application Architecture-Executive Summary 2009 TM Forum-Information Framework * NGOSS Shared Information/Data Model 2009. 2009 * Websvn-Diplomarbeit ?repname=diplomarbeit**path =%2Ftrunk%2Fdesign%2FMiddlewareUML.xmi**re v=87**sc=1**isdir=0 2009 * Architectural Thinking and Modeling with AWB: The Architects Workbench SAbrams BBloom PKeyser DKimelman ENelson WNeuberger TRoth ISimmonds STang JVlissides IBM Systems J 45 3 2006