# Introduction

ocial networking sites like Facebook Orkut, Blog Catalog, Twitter, etc are making powerful build up of social networks or different connections among persons who are interested to make friends, share valuable interests and innovative in activities. By using social networking sites, people can interact with each other, sharing and discussing information using many types of media such as their photos, videos, and always doing several of activities provided by these social networking sites. Social media networks having many forms of, including kind of blogs, user forums, tags bookmarks social networking, review process, online content sharing, etc. people can lead to use every, different activity for their needs. Social media becoming a very effective and crucial part of our living. These networks are now a useful for maintaining relationships of each priority. Social media turned new way for communication and interaction, sharing cooperation, thinking with other people. In social media networks, the connections and relationships of the same network are heterogeneous i.e. people are belonging to different groups. These relations are mixed with number of connections. e. g one user can connect to his family or friend [1].

These different connections have some limitation of effectiveness. A model is proposed to get solution on this issue of heterogeneity. This framework put the idea of social dimensions for extraction on the view of network information based on connectivity. The advantages of this model over other methods are studied efficiently on some of the social network data. Social media produces large amount of data which gives many opportunities, challenges to acquire collective behavior learning on a large area. In this research, predict collective behavior in social media network is the first goal. By observing behavior information about some person, how can get the behavior of unobserved person which is in the same network II.


# Material and Methodology a) Bi -connected Components

In previous researches one approach to find edge partitions is bi-connected components (BiComponents) is presented [2]. Bi-connected components of a graph are the sub types or parts of vertices and if this point is removed, its component cannot be disconnected. In a bi-connected component of a any graph two nodes are connected sub graph and connected by two paths at least. It is nothing but cut vertices in a connected graph, after removing must result in an increase number of connected components.bi-connected components are connected by various cut vertices. Each bi-connected component is considered as community, and converted into one social dimension for this process [1].

Algorithm to find Biconnected component -Firstly divide a graph into its different disconnected components. It performs a depth-first search (DFS) on the edges of the graph for this. After reaching new point, it is inserted on a stack and for each and every point a record is updated of the lowest point, where it is connected by a path of points which are not in a stack. When a new point not held from the top of the stack, the top point is get removed. When the stack is full, a search of a connected component is performed [16].


# i. Advantages

? BiComponents separates edges into disjoint sets hence sparse social dimension is obtained. ? BiComponents is very effective and scalable in network.

ii. Disadvantage

? BiComponents gives results of highly imbalanced community structure. In NodeCluster method, Social dimensions allow one actor to play role in multiple groups which are called as affiliations. The case when each actor is in only one group affiliation is verified. A social dimension on the basis of suitable node partition of network is constructed. A similar idea is given in a latent group model [5] for efficient inference. Previous researches gives idea of k-means clustering used to partition nodes (actors) of a network into different disjoint sets result into a consistent set of social dimensions. Support Vector Machine is used for discriminative learning.


# Advantage

? Support Vector Machine is used for discriminative learning process.


# Disadvantage

? Each actor is performing role in only one group affiliation this result in poor performance than EdgeCluster technique.


# c) Edge Clustering

As network research grows, scalable approach is developed for large-scale networks without an excessive memory requirement; this method is called edge clustering. EdgeCluster, An edge-centric clustering method to obtain sparse social dimensions is used effectively [6].Using this method, it can update the of social dimension needs to be extracted. K-means social dimensions efficiently when new nodes or new edges arrive. In a large network in which large number algorithm is used to divide edges different sets. These sets are used for the information extraction. Edge clustering method preferred because there is no overlapping of edges, which the main disadvantage of node clustering method as node overlaps with each other.

It is important to develop scalable method that can handle large-scale networks efficiently without large memory requirements. Next, an extended edge centric clustering scheme to obtain sparse social dimensions is explained. With such a method, we can also update the social dimensions efficiently when new nodes or new edges arrive. In a huge network, large number of social dimension needs to be extracted.


# III.


# Proposed System

Extending the existing approach of learning of collective behavior by presenting the approach of heterogeneity is useful for further research. Proposed framework consists of two parts: 1.Accurate social dimension extraction 2.Learning Process from extracted dimension.

In previous researches, the k-means clustering algorithm is utilized to partition the edges of a network into sets which are disjoint. Proposed efficient k-means variant is useful to take advantage of Sparsity problem, this algorithm is able to handle the clustering of large number of edges very efficiently. A model based on social dimensions is useful to be effective in this heterogeneity issue. The previous approaches, however, is not scalable to handle networks of large sizes because the extracted social dimensions are closely populated, dense. Social media network contains huge number of actors. With these huge numbers of actors, extracted dense social dimensions having problem in residing in memory, resulting in a serious computing problem and high challenges. Sparsifying social dimensions can be efficient to solve scalability problem. In this work, an extended edge-centric approach to extract sparse social dimensions is proposed. Using this proposed approach, Sparsity issue of social dimensions is achieved successfully. In proposed work, first analyzed the results of extended edge-centric based method for the extraction of social dimensions .Large social network datasets are used for this [1]. As per the problem stated, Existing edge-centric clustering approach is extended to change the object heterogeneity. Proposed approach improves the prediction performance for social networks (multimode networks) [19].   Step 1: Randomly select k centers in problem space.

Step 2: Make Suitable Partition of the data into k clusters by grouping similar data that are closest to the k centers.

Step 3: Use the mean of these clusters to find the new center.

Step 4: Update the centroid after arrival of new connection request

Step 5: Repeat steps 2 an3 until centers do not change.  By taking input as this social dimensions as features to next algorithm, learning and prediction carried out. This algorithm is idea on linear SVM [20]. The discriminative learning procedure will find out related social dimension with the behavior and then gives proper label. One observation shows that actors of the same interests eager to connect with each other [12]. For instance, it is reasonable to expect people of the same department to interact with each other more frequently. Hence, to observe actors' latent affiliations; this research aim is to find out a group of people who interact with each other frequently than their random behavior. Algorithm for Collective Behavior in network Input: Datasets, labels of some people, social dimensions;


# Node Features

Output: labels of unlabeled people or nodes (actors).

1. First obtain edge-centric view of desired network. 2. Perform proposed extended edge clustering. 3. Determine social dimensions on the basis edge partition. 4. Apply regularization technique to social dimensions. 5. Form suitable classifier based on social dimensions of labeled actors. 6. Use this classifier to predict information about labels of unlabeled ones on their social dimension.

Data sets, social dimension and label information is provided as an input and predicted labels are the outcomes of this algorithm. Utilization of efficient k means variant algorithm to use extended edge clustering is works effectively. The regularization and SVM are applied after the formation of social dimension. The regularization parameter is used the regularize communities in a network. Finally using classifier prediction of labels is achieved.

Precision and recall is obtained from evaluation of different data. Table III  Comparative analysis of all three table shows that proposed extended edge cluster is having best evaluation results with proportion of labeled nodes.

By observing table III, IV and V, it is note that the prediction performance on the social media data is average for F1 measure. The reason is the large number of distinctive labels in the data. Other reason is that only the network information is showed here.Extended Edgecentric clustering shows comparable performance to Edge Cluster, Node cluster and Bi-Components on Blog Catalog, Flickr and YouTube network. From the results our proposed method is the winner most of the time. Clearly, using these sparse social dimensions, it is easy to obtain best performance as dense social dimensions.The Node Cluster scheme in which each actor to be involved in only one group, showing poor performance compared with Edge Cluster. In table VI, VII and VIII, Edge Cluster-x denotes edge centric clustering for the construction of x dimensions. Time is denoted by the total time (seconds) to obtain the social dimensions; Space represents the memory required for the social dimensions; Density is the non-zeros entries in the dimensions; Upper bound is the density upper bound computation. Max-Aff and Ave-Aff denote the maximum and average number of affiliations one user in network. The computation time, the memory usage of social dimensions, density and other related statistics on all three data sets are carried out. The computation time of Extended Edge Cluster does not change much with clusters. The computation time of Extended Edge Cluster is of the same order; it does not depend on number of clusters. This is due to effectiveness of the proposed efficient k-means variant as for the memory utilization, sparse social dimension does superiority over dense. When the number of clusters k is small, the upper bound of the density not closely separated. As k increases, the bound is getting close. 


# Global Journal of Computer Science and Technology


# Comparative Analysis and Results


# a) Prediction Performance

Micro-F1 and Macro-F1 measures for given social media network data in 10 runs are performed.


# Actor/Individual

Output of edge partition       In the experimental studies, it is observed that, proposed model shows best performance over existing one. Figure 5,7 and 9 shows Performance graph on Blog Catalog, Flickr, YouTube Network using Micro F-1 measure respectively and Figure 6,8 and 10 shows Performance graph on Blog Catalog, Flickr and YouTube Network using Macro F-1 measure respectively. Graph clarify that Extended Edge cluster method is superior to other given methods. In proposed model of Social Dimension extraction, extended idea of Edge Clustering works very efficiently and scalability is obtained consistently. To show best performance in the comparison to previous methods, the percentage label nodes for different extraction techniques is calculated [17].

Performance of Node Cluster and Bi Compnents is poor as scalability concern; their approach is Limited. Edge Cluster showing average performance in our model. By achieving better performance in accurate behavior prediction, scalability bottleneck, object heterogeneity, etc, Proposed approach Extended Edge-Cluster is proving for best results on issue discussed earlier. Perofrmance Graph for Blog Catalog, Flickr and You Tube network using Micro F1 and Macro F1 measure is given by following figures [18].

V.


# Conclusion and Future Work a) Conclusion

In this research, an approach is presented to address the object heterogeneity in the networks. This idea is extended the scalability approach by heterogeneity in social network. Object heterogeneity is means same user involving in many activities in same social network. There are multiple modes of operation those are executed by one actor or one user. In previous methodology of collective behavior this approach was not used. In this research, mathematical model is designed and what's going to be our expected results. This research is aim to predict the results of collective behavior given a social network and the behavioral information of some people. Scalable learning of collective behavior even large numbers of individuals are involved in the network is carried out. This method follows a social-dimension learning model. These social dimensions are obtained to represent the potential affiliations of actors when discriminative learning occurs. For this scalability issue an extended edge-centric clustering scheme is proposed to obtain social dimensions and an efficient k-means variant for edge clustering algorithm. In this, each edge is referred as one data instance, and the connected nodes are the related features. Then, the proposed efficient k-means clustering algorithm can be applied to partition the edges into sets which are disjoint, with each set showing one possible group of activity. It is proved that using this edge-centric view, the obtained social dimensions are definitely sparse after extraction.


# b) Future Work

In social media, multiple actors can be working in the same network, called a multimode network. YouTube, users, videos, tags, and comments are mixed with each other in co-existence. Extending the edgecentric clustering scheme to solve this object heterogeneity issue useful in future direction. Since the proposed Edge Cluster model is sensitive to the number of social dimensions. Future research focus is needed to determine a suitable dimensionality. It is useful to extract other behavioral features (e.g., user activities and temporal spatial information) from social media, and join them with social networking information to improve prediction performance. 
![Global Journal of Computer Science and TechnologyVolume XIV Issue VI Version I](image-2.png "S")
1![Figure 1 : Architecture of Scalable Learning of Collective Behavior [1] Figure 1 show proposed system architecture, Firstly data sets of different social networking sites are entered as input. Various extraction techniques are used to achieve supervised learning. After social dimension extraction, discriminative and prediction occurs. The outcome of this, gives predicted labels in social networks.In proposed work, first analyzed the results of extended edge-centric based method for the extraction of social dimensions .Large social network datasets are used for this[1]. As per the problem stated, Existing edge-centric clustering approach is extended to change the object heterogeneity. Proposed approach improves the prediction performance for social networks (multimode networks)[19].](image-3.png "Figure 1 :")
2![Figure 2 : Algorithm efficient k-means variant](image-4.png "Figure 2 :")
3![Figure 3 : Connected Nodes in a network The Edge-centric structure of the network data is given as below:Table 1 : Edge View Structure of Network](image-5.png "Figure 3 :")
4![Figure 4 : Cluster formationOn the basis of edge clustering scenario, social dimensions can be obtained as per given in below table II and this is the final output of this algorithm:](image-6.png "Figure 4 :")
![](image-7.png "Volume")
![Journals Inc. (US) Extended Edgecluster Based Technique for Social Networking Collective Behavior Learning System](image-8.png "")
5678![Figure 5 : Blog Catalog network (micro F1)](image-9.png "Figure 5 :Figure 6 :Figure 7 :Figure 8 :")
910![Figure 9 : You Tube network (micro F1)](image-10.png "Figure 9 :Figure 10 :")
2: desired social dimension(s)b) Effective Learning and Prediction

3Proportion of Labeled10%20%30%40%50%60%70%80%90%NodesMicro-Extended37.43 40.9137.1541.9741.2041.2441.7542.5543.54F1 (%)EdgeClusterEdgeCluster27.94 30.7631.8532.9934.1235.0034.6335.9936.29BiComponents16.54 16.5916.6716.8317.2117.2617.0417.7617.61NodeCluster18.29 19.1420.0119.8020.8120.8620.5320.7420.78Macro-Extended18.51 22.3121.2723.5226.2728.7030.8632.7234.41F1 (%)EdgeClusterEdgeCluster16.16 19.1620.4822.0023.0023.6423.8224.6124.92BiComponents2.772.802.823.013.133.293.253.163.37NodeCluster7.387.027.276.857.577.276.887.046.83
4Proportion of Labeled10%20%30%40%50%60%70%80%90%NodesMicro-Extended37.4330.7829.8330.2131.0932.2433.5534.9636.42F1 (%)EdgeClusterEdgeCluster25.7528.5329.1430.3130.8531.5331.7531.7632.84BiComponents16.4516.4616.4516.4916.4916.4916.4816.5516.55NodeCluster22.9424.0925.4226.5328.1828.3228.5828.7028.93Macro-Extended20.0022.3121.2723.5226.2728.7030.8632.7234.41F1 (%)EdgeClusterEdgeCluster10.5214.1015.9116.7218.0118.5419.5420.1820.78BiComponents0.450.460.460.460.460.460.460.470.47NodeCluster7.909.9911.4211.1012.3312.2912.5813.2612.79
5Proportion of Labeled10%20%30%40%50%60%70%80%90%NodesMicro-Extended36.9940.0044.3642.0046.0046.2045.8146.3147.07F1 (%)EdgeClusterEdgeCluster23.9031.6835.5336.7637.8138.6338.9439.4639.92BiComponents23.0924.5124.8025.3925.2025.4225.2424.4425.62NodeCluster20.8324.5726.9128.6529.5630.7231.1531.8531.29Macro-Extended20.5426.3129.2730.5232.2735.7036.8636.7238.41F1 (%)EdgeClusterEdgeCluster19.4825.0128.1529.1729.8230.6530.7531.2331.45BiComponents6.807.057.197.447.487.587.617.637.76NodeCluster17.9121.1122.3823.9124.4725.2625.5026.0226.44b) Sparsity ComparisonTable 6 : sparsity comparison on blog catalog networkMethodsTime Space Density Upper Bound Max-Aff Ave-AffExt.EdgeCluster-1080.39120.043140.0479640.3259Ext.EdgeCluster-1620.3512.10.042220.0481450.3555Ext.EdgeCluster-2160.3712.30.072770.0570350.5833Ext.EdgeCluster-2700.3212.60.092030.0620350.7000Ext.EdgeCluster-3240.3712.70.091660.0621250.7000
7
8: sparsity comparison on youtube networkMethodsTime Space Density Upper Bound Max-Aff Ave-AffExt.EdgeCluster-108 0.360130.042590.0425940.3243Ext.EdgeCluster-162 0.45413.450.042770.0479650.3333Ext. EdgeCluster-216 0.40613.560.041850.0480050.3314Ext.EdgeCluster-270 0.43713.670.041110.0479750.3351Ext.EdgeCluster-324 0.45313.980.042850.0720390.9333
MethodsTime Space Density Upper Bound Max-Aff Ave-AffExt.EdgeCluster-1080.32812.200.043140.0480040.3240Ext.EdgeCluster-1620.39112.400.042590.0481650.3556Ext.EdgeCluster-2160.36012.700.072770.0569950.5851Ext.EdgeCluster-2700.45312.440.092030.0620050.6981Ext.EdgeCluster-3240.51612.670.093000.0621450.7000Year 201444Volume XIV Issue VI Version ID D D D ) c(Global Journal of Computer Science and Technology© 2014 Global Journals Inc. (US) c) Performance Graph of Blog Catalog, Flickr and YouTube
			© 2014 Global Journals Inc. (US) e-mails : umeshshingote62@gmail.com , dr.setuchaturvedi@gmail.com
			© 2014 Global Journals Inc. (US) Extended Edgecluster Based Technique for Social Networking Collective Behavior Learning System
		
		
* 
	
		Scalable Learning of Collective Behavior
		
			LeiTang
		
		
			XufeiWang
		
		
			HuanLiu
		
	
		IEEE 2012 Transactions on Knowledge and Data Engineering
		
			24
			6
			2012
		
	
* 
	
		Link communities reveal multi-scale complexity in networks
		
			YYAhn
		
		
			JPBagrow
		
		
			SLehmann
		
		
			2009
		
	
* 
	
		Birds of a feather: Homophily in social networks
		
			MMcpherson
		
		
			LSmith-Lovin
		
		
			JMCook
		
	
		Annual Review of Sociology
		
			27
			
			2001
		
	
* 
	
		Classification in networked data: A toolkit and a univariate case study
		
			SAMacskassy
		
		
			FProvost
		
	
		J. Mach. Learn. Res
		
			8
			
			2007
		
	
* 
	
		Leveraging relational autocorrelation with latent group models
		
			JNeville
		
		
			DJensen
		
	
		MRDM '05: Proceedings of the 4th international workshop on Multirelational mining
				New York, NY, USA
		
			ACM
			2005
			
		
* 
	
		Toward predicting collective behavior via social dimension extraction
		
			LTang
		
		
			HLiu
		
	
		IEEE Intelligent Systems
		
			25
			
			2010
		
	
* 
	
		Introduction to Statistical Relational Learning
		L. Getoor and B. Taskar
		
			2007
			The MIT Press
		
	
* 
	
		Nonparametric relational learning for social network analysis
		
			ZXu
		
		
			VTresp
		
		
			SYu
		
		
			KYu
		
	
		KDD'2008 Workshop on Social Network Mining and Analysis
				
			2008
		
	
* 
	
		Relational learning via latent social dimensions in KDD '09
	
	
		Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
				the 15th ACM SIGKDD international conference on Knowledge discovery and data miningNew York, NY, USA
		
			ACM
			2009
			
		
* 
	
		Finding and evaluating community structure in networks
		
			MNewman
		
		
			MGirvan
		
	
		Physical Review E
		
			69
			26113
			2004
		
	
* 
	
		Soft clustering on graphs
		
			KYu
		
		
			SYu
		
		
			VTresp
		
	
		NIPS
				
			2005
		
	
* 
	
		Link comm. -unities reveal multi-scale complexity in networks
		
			YYAhn
		
		
			JPBagrow
		
		
			SLehmann
		
	
		KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
				New York, NY, USA
		
			ACM
			2009. 2009
			
		
	Relational learning via latent social dimensions


* 
	
		Large scale multilabel classification via metalabeler
		
			LTang
		
		
			SRajan
		
		
			VKNarayanan
		
	
		WWW '09: Proceedings of the 18th international conference on World Wide Web
				New York, NY, USA
		
			ACM
			2009
			
		
* 
	
		Community evolution in dynamic multi-mode networks
		
			LTang
		
		
			HLiu
		
		
			JZhang
		
		
			ZNazeri
		
	
		KDD '08: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
				New York, NY, USA
		
			ACM
			2008
			
		
* 
	
		Algorithm 447: efficient algorithms for graph manipulation
		
			JHopcroft
		
		
			RTarjan
		
	
		Commun. ACM
		
			16
			6
			
			1973
		
	
* 
	
		Large scale multilabel classification via metalabeler
		
			LTang
		
		
			SRajan
		
		
			VKNarayanan
		
	
		WWW: Proceedings of the 18th international conference on World Wide Web
				New York, NY, USA
		
			ACM
			2009
		
	
* 
	
		Macro-and micro-averaged evaluation measures
		
			Vincent Van Asch
		
		
			2003
		
	
* 
	
		Community evolution in dynamic multi-mode networks
		
			LTang
		
		
			HLiu
		
		
			JZhang
		
		
			ZNazeri
		
	
		KDD '08: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
				
			2009
		
	
* 
	
		Support Vector Machine Classification of Microarray Gene Expression Data
		
			PSMichael
		
		
			Brown William Noble
		
		
			DavidGrundy
		
		
			NelloLin
		
		
			CharlesCristianini
		
		
			ManuelSugnet
		
		
			JrAres
		
		
			DavidHaussler
		
		
			1997