Figure 1. Example of Change Detection
Social network change detection (SNCD) is a process of monitoring social networks to determine when significant changes to their organizational structure occur and what caused them. This scientific approach combines analytical techniques from social network analysis with those from statistical process control(SPC). SNCD can be used to detect when significant changes occur in a network. In application, it requires the use of statistical process control charts to detect changes in observable network measures. By taking measures of a network over time, a control chart can be used to signal when significant changes occur in the network.[1] SNCD may offer executives and military analysts a tool to operate inside the normal decision cycle.
SNCD not predicting change,but rather detecting that a change occurred quickly and making some inference about the actual time of change.For example, before a terrorist commits an attack, there will be a change in the social network as the organization plans and resources the attack. SNCD may allow an analyst to detect the change in the social network, prior to the successful completion of the attack.[2]
Background
There has been a recent increase in temporal social network data . Unobtrusive tools now exist to extract network data from e-mail servers,from news media, from written documents within an organization. This allows an analyst to construct multiple network observations of an organization, whether it is daily, weekly,yearly, or any other temporal breakdown. With the increased emergence of observed instances of social networks over time, improved methods of detecting meaningful change are needed. Simply looking for obvious drastic changes may be insufficient for many applications.[1]
However, methods of change detection in social networks are limited. Hamming distance (Hamming, 1950) is often used in binary networks to measure the distance between two networks. Euclidean distance is similarly used for weighted networks (Wasserman and Faust, 1994). While these methods may be effective at quantifying a difference in static networks, they lack an underlying statistical distribution. This prevents an analyst from identifying a statistically significant change, as opposed to normal and spurious fluctuations in the network. Social Network Change Detection significantly improves on previous attempts to detect organizational change over time by introducing a statistically sound probability space and uniformly more powerful detection methods. Encouraged by this,SNCD was proposed by McCulloh.
History
SNCD was initially proposed by Major Ian McCulloh, an assistant professor in the U.S. Military Academy's Network Science Center in 2006. Since then, SNCD has been presented at a variety of venues from NetSci2007 in New York City to the International Network for Social Network Analysis annual conference in 2008, to the Military Operations Research Society Working Group on emerging threats and social networks in 2008.
McCulloh and Carley complete a project in 2008 which is supported in part by army since the work of the project is related with terrorism detection.In the project report, McCulloh illustrate the detection methodology by three data sets, email communications among graduate students and perceived connections among members of al Qaeda based on open source data.Results of the project indicate that the approach illustrated in the report is able to detect change even with the high levels of uncertainty inherent in these data.
In 2009, McCulloh further develop the idea and use his methodology to detect changes in dynamic social network.This new approach is demonstrated in multi-agent simulation as well as on eight different real-world data sets.
Applications
To provides an estimate of when a change occurred, the CUSUM procedure is used to demonstrate SNCD on two data sets. The optimality constant k is set to 0.5 corresponding to a shift of one standard deviation; and the decision interval h is set to 3.5, corresponding to 1% false positive rate. Data gathered from survey and text gathered from internet, well established datasets in the social network literature, are used to illustrate this method. The features of datasets are listed below:
Comparison of Real World Data[2].
|
No. of Nodes |
Time periods |
Method of Collection |
Type of Relation |
Design |
Known Change |
Fraternity |
17 |
15 |
Survey |
Ranking |
Fixed |
Yes |
Al-Qaeda |
62-260 |
17 |
Text |
Rating |
Free |
Yes |
Newcomb Fraternity
This data set was gathered from an experiment conducted on 17 incoming transfer students at the University of Michigan, by Theodore Newcomb (1961). These participants, with no prior acquaintance, were housed together in fraternity house, and they were asked to rank each other from 1 to 16 by preference, where 1 stands for the person they felt the most comfortable with. Data was collected weekly for 15 weeks, except for the 9th week. David Krackhardt (1998) dichotomized the network data by assigning a link to preference ratings of 1-8 and having no link for ratings of 9-16. To determine typical behavior, the mean and standard deviation of the density, average betweenness, and average closeness were estimated from the first five networks. The CUSUM statistic was then calculated for all time periods.
The approach successfully detected significant events in the Fraternity network data[2]. For each social network measure monitored, two control charts are needed to run because the CUSUM will detect either increases or decreases in a measure rather than both. The betweenness measure is chosen to signal changes here because the closeness measure is similar to the betweenness measure and the density measure is not effective for a fixed network network.
In this illustration, the control chart for average betweenness signals at time period 13 that a change may have occurred in the social network of the fraternity members. The most likely time that the change actually occurred is time period 8 in the Newcomb Fraternity data, which is the last time period that the C statistic was equal to 0. This time point was the week before a mid-semester Break. Therefore, it is probably that social relationships may have changed over a break as participants possibly vacationed together. Although details of the group are not completely known, this approach was still proven to work well in detecting network changes.
Al-Qaeda
The Center for Computational Analysis of Social and Organizational Systems (CASOS) at Carnegie Mellon University created snapshots of the annual communication between members of the al Qaeda organization from its founding in 1988 until 2004 from open source data[3]. What this open source dataset provide is a limited network in which who initiated communication with whom is unknown and the completeness of the network is uncertain.
The betweenness, closeness, and density measures increased from 1988 until 1994, and then leveled off. This situation could be explained by the varies of quality of intelligence gathering on al Qaeda and the rapid changes such as development and reorganization within the organization. Therefore, the CUSUM control chart was applied to the data from 1994 to 2004.
Through this dataset with more nodes on a longer time span, major event in Al-Qaeda’s history should be detectable. Through changes of the average betweenness CUSUM statistic, we could try to identify the point in time when the organization changed and began to plan the attacks.Accroding to the statistic, the most likely time that the change occurred is 1997[2]. Looking at the events occurring within the Al-Qaeda network and the external environment in 1997 can help us to have a better understanding of the cause of the change.
We need to always be cautious towards the data collected retrospectively and data that likely to be incomplete. Based on SNCD on data collected within organizations, it is possibly for analyst to warn action carried out by the group or organization.
Sensitivity to Risk of False Positive
Sensitivity to the risk of false positive is an important consideration in detecting change in longitudinal network data. False positives occur when a change detection procedure indicates that a change may have occurred, when in fact there is no change[2]. We need to be careful with the trade-off between false positives and rapid detection. The balance is determined by the interval of the change detection procedure. It is important to determine a desired risk for false positive, and then monitor longitudinal networks for change.
Specifically, the change detection procedure may miss real changes when a very low risk of false positive is set; and the procedure would signal changes more rapidly when the risk of false positive is set to a higher value. The analyst should carefully consider the trade-off between false positives and rapid detection when using SNCD[2].
Limitations
The use of the cumulative sum (CUSUM) procedure has the following limitations:
1. The method is limited to normally distributed work measures, and a period of dynamic equilibrium must be assumed to estimate parameters of the control chart.[1]
2. As it is a statistical approach, it highly depends on the data that is used to do the analysis. Limitations on the data will make it difficult to determine the validity of the results.[1]
3. Due to the assumption that network measures are normally distributed, research on the distributions is needed. Preliminary work on these distributions suggests that the assumption of normality does not hold for small networks, extremely sparse networks, and for certain metrics.[4] If the network measures are not normally distributed, the false alarm probability will increase. As a result, a different control chart must be used or a new approach at the problem made.
4. Findings are limited to modeling and detecting changes, but not the causes of the change.[2]
As social network change detection is quite a new concept and not many applications of statistical process control method have been conducted, more limitations of the algorithm cannot yet be determined. Future research will provide much greater insight into the limitations of this approach to the problem.
Future work
It is important that future work examine the errors associated with this technique, both the false positives and false negatives. Due to the dependence on data obtained, future work should also consider the sensitivity of this approach to missing information, and to the reason why the information is missing. In order to rectify those shortcomings, future work should focus on near-complete datasets with high resolution. Near complete data means that the data should cover the communication network with little or no missing information for a large contiguous period. As addressed above in Limitations, if the network measures are not normally distributed, a different control chart must be used or a new approach at the problem made. That will need future work to find a more general solution.[1]
It may also be possible to extend change detection to node level measures. Again the distributional assumptions would need to be verified. Node level change detection may help further isolate change in an organization by monitoring the behavior of key individuals, without the noise introduced by less influential agents. More work in this area will be beneficial.[2]
It is also helpful to look at the sensitivity of the optimality constant, k and control limit values of the CUSUM Control Chart for network measure change detection. By using further Monte Carlo simulations, a researcher should determine which parameter value would be best in detecting certain types of changes such as sudden large changes or slow creeping shifts. Usage of control charts on comparing models and observations should also be studies to see what specific conclusions can be obtained.[1]
Other features could also increase the usability and utility of these techniques, including auto-identification and visualization of critical features, and improved data extraction and fusion techniques.[5]
See also
Notes
- ↑ 1.0 1.1 1.2 1.3 1.4 1.5 Lua error in package.lua at line 80: module 'strict' not found.
- ↑ 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 Lua error in package.lua at line 80: module 'strict' not found.
- ↑ Lua error in package.lua at line 80: module 'strict' not found.
- ↑ Lua error in package.lua at line 80: module 'strict' not found.
- ↑ Lua error in package.lua at line 80: module 'strict' not found.
References
- Baller, D., McCulloh, I., Carley, K.M., and Johnson, A.N. (2008). Specific Communication Network Measure Distribution Estimation. Sunbelt XXVIII, the annual conference for the International Network of Social Network Analysts, Saint Petersburg, FL, 24 January 2008.[1]
- Carley, K. M. (2003). Dynamic network analysis. In P. Pattison (Ed.), Dynamic social network analysis: Workshop summary and papers: 133–145. Washington D.C.: The National Academies Press.
- McCulloh, I., Garcia, G., Tardieu, K., MacGibon, J., Dye, H., Moores, K., Graham, J. M., & Horn, D. B. (2007). IkeNet: Social network analysis of e-mail traffic in the Eisenhower Leadership Development Program. (Technical Report, No. 1218). Arlington, VA: U.S. Army Research Institute for the Behavioral and Social Sciences.
- McCulloh, I., Lospinoso, J., and Carley, K.M. (2007). Social Network Probability Mechanics. Proceedings of the World Scientific Engineering Academy and Society 12th International Conference on Applied Mathematics, Cairo, Egypt, 29–31 December 2007, pp. 319–325,[2]
- McCulloh, I., Webb, M., Carley, K.M. (2007). Social Network Monitoring of Al-Qaeda. Network Science Report, Vol 1, pp 25–30.[3]
|
Types |
|
Networks |
|
Services |
|
Concepts and
theories |
|
Models and
processes |
|
Economics |
|
Phenomena |
|
Related topics |
|