# Introduction ow a days, with huge volume of user data contraption, common or frequent database management systems cannot effectively sustain data management and analysis in many fields, including meteorology, scientific instruments, social networks, and medical networks. In these and other fields we need a pattern shift to address our problems. Capturing, storing and retrieving information in a timely manner are vital issues in these systems. It is necessary to have available and reliable solutions for these kinds of problems because the prevalent single-node and parallel approaches are far from offering a timely solution. On the other hand, reliable and available resolutions have their own troubles, in particular network bottlenecks, low performance of hardware nodes, and necessities for other nodes' information. Social Network is one of the fields that need reliable and data available solutions, because current solutions cannot properly solve this area's problems. One of the most important problems in this area is identifying user's likeness, or ULi, defined as the rate of likeness between two or more users in terms of their like, interests, personal information, etc. The goal in ULi is to identify those Users who have the greatest amount of information in common in order to use their Preferences or recommendations for new users. We have two main issues in ULi: the huge amount of information per users; and the fact that most of this data is nonstructured, lacking a predefined record structure that is common among all users. A large number of fields per users may add complexity to ULi problems as well. Given these characteristics, we have to use so-called "big data" solutions. One of the methods which can be used for reliable and data available solutions for big data is MapReduce. MapReduce is used to solve Social Network problems. But MapReduce and other data available solutions have problems such as data locality, network bottlenecks, hardware inefficiency etc. In this paper, we propose RiDaULi, a reliable and data available method for investigating user's likeness. In this method, a MapReduce-based method is used to solve ULi problems. Unlike other approaches, we do not use structured or semi-structured methods for user's information storage. RiDaULi can use different data sources with different data items. Even the same data source can have different data items for two users. Rather, RiDaULi uses a dynamic method to store user's information which can be easily dispersed over hardware nodes. In the proposed method hardware nodes can execute their tasks simultaneously, and none of the nodes needs information from other nodes which is the main problem of MapReduce-based methods. The structure of this paper is as follows. Section 2 investigates some preliminaries concerning MapReduce and Social Network problems. In Section 3, ULi-related literature is discussed. Section 4 focuses on the proposed method. Section 5 presents the evaluation of the proposed method. Section 6 provides the conclusion. # Ground Work In this part, both MapReduce and the relationship between Social Network and big data are explained. # a) MapReduce In this section, the literature related to MapReduce design is discussed, a decomposable algorithm, partitionable data, and sufficient small data partition are the main characteristics required for effective use of MapReduce. In [23], classic MapReduce was optimized to decrease the data transformation load. In the method described in [23], a shared area for information was considered. This type of design is suitable for solving problems, such as k-nn and top k queries. MPI (Message passing interface) was used for message passing in a MapReduce structure. The goal of that paper was to decrease the amount of data transferred in the MapReduce network. A method was developed for tackling workloads in hierarchical MapReduce architectures. Hadoop and uses a deduplication-based snapshot differential algorithm (D-SD) and update propagation. Haloop is another type of MapReduce structure suitable for iterative problems. iMapreduce also supports iterative processes. In [20], HDFS (Hadoop file system) was substituted with a concurrency optimized data storage layer based on the BlobSeer data management service. In [22], a model was presented to estimate I/O behavior of MapReduce applications. In [21], optimization over MapReduce structure was divided into five groups. Fig. 1 shows these groups # b) Social Network and big data In this section, Social Network and its relation to big data are investigated. These days, users' information is generated at an exponential rate. This information has different formats and standards. According to [19], there are various standard data sources, As shown in Fig. 2, huge Volume of information is generated in Various formats with high Velocity; therefore, we have three Vs of Big data in Social Network networks. With ULi there is an additional challenge, namely Veracity, meaning that for many users we typically have doubtful or uncertain information. Social Network problems visible all of the V's, and therefore it is inevitable that we will use big data solutions to solve them but, according to [19], existing big data technologies do not effectively deal with the full spectrum of Social Network problems, so it is necessary to customize them for our purposes. According to high volume of information in Social Network big data is necessary for data analysis .Also costs are reduced by using big data analytics in Social Network. In a userscentered framework is proposed that Can personalize Social Network with a big data driven approach. In [35] big data is used to solve problems like the selection of appropriate recommendation paths or improvement of Social Network systems. AITION [37] proposed a reliable knowledge data discovery platform for big data Social Network. # Literature on Uli In this section, literature specifically concerned with ULi is investigated. According to [1], finding ULi solutions can be divided into two parts. Fig. 3 shows this categorization. The first category is solutions that identify ULi relationships by machine learning algorithms [3][4][5]. These types of solutions are offline and they require a long time for the machine learning to take place. Also there are data mining methods which work on streaming data and they can be considered as online data mining methods. These methods can only work on a part of data. In other word they have methods like sliding window, sampling, synopsis etc. over stream data; therefore, this method is not appropriate for ULi problem because we need to analyze all data items [40]. The second category uses information retrieval techniques. Some techniques use simple search [6,7]; however, searching over limited keywords within a predefined structure may have severe limitations. Another information retrieval solution involves Using Entity Relationship Graphs (ERG) to investigate similarities between de-fined entities [8,9]. These types of solutions are expensive, and some are not online [8,9]. Some methods try to improve the ERG solution by unified search [10,11]. In [2] MapReduce is used to solve the problem. They tried to reduce algorithm execution time by distributing computation on hardware nodes. PARAMO [36] is a method which uses MapReduce to develop a predictive modeling platform in the Social Network analytics domain. Some methods used LSH [39] (Locality-Sensitive Hashing) for finding similarities [31]. In [31] LSH and MapReduce are used to extract user's likeness. LSH is not suitable for ULi problem because it works with predefined data structure and with ever changing data sources accuracy will reduced dramatically. According to our investigation, none of the above-mentioned methods are fully effective for solving ULi problems, because of the following considerations: ULi requires a dynamic structure to store users' information. Different users have different data items, and thus require a structure which can store data with different standards and different data formats with no default assumptions. # Fig. 3 : Finding user likes ? In the ULi data retrieval phase, the proposed method has to accept all types of input data items and be able to dynamically create queries over all users' data fields. ? ULi implementation time is very important; the method has to implement in a appropriate manner and with high precision. Offline and long-time query execution is not satisfactory. ? Given the huge volume of data generation, distributed solutions are necessary. In this paper we introduce RiDaULi, a reliable and data available method that uses dynamic data structure to store users' data items from data sources with different formats. It can also retrieve data items by dynamic query generation. In this connection our system achieves reliable and data available architecture of RiDaULi, acceptable query execution time is achieved. To the best of our knowledge, RiDaULi is unique in being able to offer a solution to the ULi problem. IV. # Proposed Method With our proposed method we illustrated RiDaULi is a reliable and data available method which is based on MapReduce. In this method, users' input data is converted to a integrated format as explained below. This adaptation has two main primitive advantages. First, varying in input data does not affect the RiDaULi format; therefore, we can allow any data format without any changes in our format. Second, this format is suitable for MapReduce architecture and helps us to dispense data over nodes. Moreover, each node can do its tasks without the need for other nodes' information. # Global Journal of C omp uter S cience and T echnology Volume XV Issue VII Version I Year 2015 # ( C ) Because of these advantages, we can easily solve ULi problems over distributed nodes. Users' records in various formats can be stored, and efficiency can be achieved by autonomous calculations. # a) Data allocation Because of the unified data format of RiDaULi, data can be distributed over different nodes. Processing power and memory of each hardware node can be important factors to allocate data items to each node. # b) Query execution To execute queries over MapReduce architecture, the queries first have to be converted to an appropriate format for RiDaULi. Then each converted query is sent to the nodes separately for execution, and the RowIDs of the results are returned. Finally, the extracted RowIDs are sent to the Phase 2 Mappers, and users' information is retrieved. As shown in Fig. 4, each Phase 1 Mapper sends its results as triples. In the Phase 1 Reducer, aggregation is done on Score based on RowID, and the final Score per RowID is calculated. In the Phase 2 Mapper, other fields with corresponding RowIDs are extracted. The resulting formats of Phase 2 Mappers areas. In Phase 2 Reducer, results of Phase 2 Mappers are aggregated. Also, Phase 1 Reducer results are sent directly to thelikeness Ranker, which sorts RowIDs according to their scores; then, when a RowID is selected by the user, other related information is extracted. -An Efficient Mapreduce-based System to Find Userlikeness on Social Networks Also to identify equal fields on different data sources it is necessary to have the RiDaULiEqual table. Table 5 shows RiDaULiEqual. Then all rows that are equal to extracted ColumnIDs are retrieved from the RiDaULiFact table. Emit function execute queries and put results into the specified table on the specified server. If the specified table does not exist it creates a table with the specified name. For the Score calculation, many algorithms can be used. Here we use a simple algorithm, in which input users data items are compared with the same data items of existing users. If the data item value of the existing users is exactly equal to the input user's data item value, then its Score is equal to two. Otherwise, if the user's data item value is partially similar to an existing user's data item value, then the Score is equal to one. If there is no likeness between the input data item value and the existing data item values then the Score is equal to zero. In the data sources there are many misspellings, imprecise terms, colloquial terms, etc. To solve these problems we use metadata to create associations between columns. In the Query builder phase, we can define column groups which contain the main term together with its colloquial terms, imprecise terms and prevalent misspellings. When an input column is used in a query, all other Fig. 4 : RiDaULi Process to execute query Group members are considered and their related information is gathered. If there is a bottleneck in the Reducer phase, we remove these via combiners. Fig. 5 shows the RiDaULi architecture with combiners. In this We used data from different Social Network systems, which in turn have different standards for storing data, by Using RiDaULi, we found that we could easily achieve the required results on a reliable and data available structure. As shown in Fig. 4, twentyone servers were used in Phase 1 and twenty-one for Phase 2. For thirty seven different queries we achieved an average time of 9.42 seconds. As shown in Fig. 5, we then added five combiner servers with the same specifications to each of the two phases, for a total of 52 servers. The average execution time for thirty seven queries improved about 60%, decreasing to 5.65 seconds. and 5. Also we used the LSH algorithm over MapReduce for evaluation. 52 servers with the Table 7 specification were used. For thirty seven different queries we achieved an average time of 63.11 seconds. Fig. 7 shows the results. # VI. # Conclusion In this paper, we propose RiDaULi, a reliable and data available method to solve user likeness (ULi) problems over Social network. Previously, the standard methods were based on Machine Learning (ML) or Information Retrieval (IR). ML methods need a long time to execute, and are offline. Standard IR methods have -An Efficient Mapreduce-based System to Find Userlikeness on Social Networks V. # Evaluation In this section we evaluate RiDaULi from two views. First the execution time of the proposed method is evaluated, and second the accuracy of RiDaULi is calculated. As per illustration we producing sample Expected results. many limitations for information storing and query processing; they support only a basic user interface, and limit the kinds of queries that can be built. Online data mining methods have good performance with predefined data sources and are not suitable for dynamic data sources. Also there are some methods like LSH that can properly work over distributed environments but their performances are decreased when there are many changes in input data sources. RiDaULi is an IR method which supports different data formats. All of these formats can be retrieved by data unification. In this method all fields need not be completed, and for each user only the existing fields are entered. This feature allows for data storage size to be considerably reduced. Our evaluation shows that RiDaULi can solve ULi problems effectively. Because of the reliable and data available nature of RiDaULi, it can utilize hardware effectively in order to solve problems involving huge amounts of data. # Global 1![Fig 1: MapReduce Optimization](image-2.png "Fig 1 :") 2![Fig. 2 : Standard data sources in Social Network](image-3.png "Fig. 2 :") ![Journal of C omp uter S cience and T echnology Volume XV Issue VII Version I Year 2015 ( C ) And ...". First all ColumnIDs are extracted from the RiDaULiColumn table.](image-4.png "Global") 5![Fig. 5 : RiDaULi architecture with combiner](image-5.png "Fig. 5 :") 6![Fig . 6 : RiDaULi architecture with two combiners](image-6.png "Fig . 6 :") 7![Fig 7 : Total Execution Time Fig. 6 shows a comparison between the two phases of RiDaULi shown in the architectures of Figs. 4and 5. Also we used the LSH algorithm over MapReduce for evaluation. 52 servers with the Table7specification were used. For thirty seven different queries we achieved an average time of 63.11 seconds. Fig.7shows the results.](image-7.png "Fig 7 :") ![Journal of C omp uter S cience and T echnology Volume XV Issue VII Version I Year 2015 ( C ) © 2015 Global Journals Inc. (US)](image-8.png "") ![](image-9.png "") 1Source IDSource name1Facebook2Twitter3Linkedin?.??. 2IdNameageGenderhabitsLikes1Likes21211sai20MaleReadingSpiritualfictionbooks1212ram40MaleWatchingActioncomedyMoviesseetha35FemaleListeningmelodydevotional1213Music 3Column IdColumn nameData Source ID1ID12name13age1 4Column IdRow IdValue11211sai212112031211male 1 4has severaladvantages:? Dynamic columns definition? Completion of all fields is not necessary? Unified data format? Data storage size reductionThe proposed data format is suitable for theMapReduce structure, and allows us to execute queriessimultaneously on different nodes. There are severalsteps to Using RiDaULi:? ETL (Extract/Transform/Load): First, informationfrom different data sources is gathered, and themetadata table (like Table 3) and data table (likeTable 4) are created.GetColumnID function retrieves ColumnID of aspecific field from the RiDaULiColumn table. Inputparameters are DataSourceID and ColumnName. © 2015 Global Journals Inc. (US) 1 © 2015 Global Journals Inc. (US) An Efficient Mapreduce-based System to Find Userlikeness on Social Networks © 2015 Global Journals Inc. (US) 1 * BCommunity detection in graphs SFortunato Phys. Rep 486 3-5 2010 * BDetecting community structure in networks ME JNewman Eur. Phys. J. BVCondens. Matter Complex Syst 38 2 Mar. 2004 * TrackerFacebook Data * BFinding and evaluating community structure in networks ME JNewman MGirvan Phys. Rev. E 69 2 Feb. 2004 * BModularity and community structure in networks ME JNewman Proc. Nat. Acad. Sci. USA Nat. Acad. Sci. USA Jun. 2006 103 * BAn information flow model for conflict and fission in small groups WWZachary J. Anthropol. Res 33 4 1977 * BCommunity structure in social and biological networks MGirvan ME JNewman * Nat. Acad. Sci. USA 99 12 Jun. 2002 * BMapReduce: Simplified data processing on large clusters JDean SGhemawat Commun. ACM 51 1 Jan. 2008 * BDesign patterns for efficient graph algorithms in MapReduce JLin MSchatz Proc. ACM 8th Workshop Mining Learn. Graphs ACM 8th Workshop Mining Learn. Graphs 2010 * BThe anatomy of a large-scale hypertextual Web search engine1 SBrin LPage Comput. Netw. ISDN Syst 30 1-7 Apr. 1998 * BMeasurement and analysis of online social networks AMislove MMarcon KPGummadi PDruschel BBhattacharjee Proc. 7th ACM SIGCOMM Conf. Internet Meas 7th ACM SIGCOMM Conf. Internet MeasSan Diego, CA, USA Oct. 2007 * BUser interactions in social networks and their implications CWilson BBoe ASala KP NPuttaswamy BYZhao Proc. 4th ACM Eur. Conf. Comput. Syst 4th ACM Eur. Conf. Comput. SystNuremberg, Germany Mar. 2009 * The Osn Data Set * BGraph clustering SESchaeffer Comput. Sci. Rev 1 1 Aug. 2007 * BX-RIME: Cloud-based large scale social network analysis WXue JShi BYang Proc. IEEE Int. Conf. Services Comput IEEE Int. Conf. Services Comput 2010 * BAn efficient heuristic procedure for partitioning graphs BWKernighan SLin Bell Syst. Tech. J 49 1 1970 * BE-mail as spectroscopy: Automated discovery of community structure within organizations JRTyler DMWilkinson BAHuberman Inform. Soc 21 2 2005 * BUsing structure indices for efficient approximation of network properties MJRattigan MMaier DJensen Proc. 12th ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining 12th ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining 2006 * BOn modularity clustering UBrandes DDelling MGaertler RGo¨ Rke MHoefer ZNikoloski DWagner IEEE Trans. Knowl. Data Eng 20 2 Feb. 2008 * BAdaptive algorithms for detecting community structure in dynamic social networks NPNguyen TNDinh YXuan MTThai Proc. IEEE INFOCOM IEEE INFOCOM 2011