Chinese Whispers (clustering method)

From Infogalactic: the planetary knowledge core
Jump to: navigation, search

Lua error in package.lua at line 80: module 'strict' not found.

Chinese Whispers is a clustering method used in network science named after the famous whispering game.[1] Clustering methods are basically used to identify communities of nodes or links in a given network. This algorithm was designed by Chris Biemann and Sven Teresinak in 2005.[1] The name comes from the fact that the process can be modeled as a separation of communities where the nodes send the same type of information to each other.[1]

Chinese Whispers is a hard partitioning, randomized, flat clustering (no hierarchical relations between clusters) method.[1] The random property means that running the process on the same network several times can lead to different results, while because of hard partitioning one node can only belong to one cluster at a given moment. The original algorithm is applicable to undirected, weighted and unweighted graphs. Chinese Whispers is time linear which means that it is extremely fast even if the number of nodes and links are very high in the network.[1]

Algorithm

File:Chinese Whispers example cluster.png
An example of how Chinese Whispers works in action. The different colors represent different classes.

The algorithm works in the following way in an undirected unweighted graph:[1]

  1. All nodes are assigned to a random class. The number of initial classes equals the number of nodes.
  2. Then all of the network nodes are selected one by one in a random order. Every node moves to the class which the given node connects with the most links. In the case of equality the cluster is randomly chosen from the equally linked classes.
  3. Step two repeats itself until a predetermined number of iteration or until the process converges. In the end the emerging classes represent the clusters of the network.

The predetermined threshold for the number of the iterations is needed because it is possible, that process does not converge. On the other hand in a network with approximately 10000 nodes the clusters does not change significantly after 40-50 iterations even if there is no convergence.[1]

Strengths and Weaknesses

The main strength of Chinese Whispers lies in its time linear property. Because of the processing time increases linearly with the number of nodes, the algorithm is capable of identifying communities in a network very fast. For this reason Chinese Whispers is a good tool to analyze community structures in graph with a very high number of nodes. The effectiveness of the method increases further if the network has the small world property.[1]

On the other hand because the algorithm is not deterministic in the case of small node number the resulting clusters often significantly differ from each other. The reason for this is that in the case of a small network it matters more from which node the iteration process starts while in large networks the relevance of starting points disappears.[1] For this reason for small graphs other clustering methods are recommended.

Applications

Chinese Whispers is used in many subfield of network science. Most frequently it is mentioned in the context of Natural Language Processing problems.[2][3] On the other hand the algorithm is applicable to any kind of community identification problem which is related to a network framework. Chinese Whispers is available for personal use as an extension package for Gephi[4] which is an open source program designed for network analysis.

References