Abstract
This paper initiates formal analysis of a simple, distributed algorithm for community detection on networks. We analyze an algorithm that we calltextsc {Max-LPA}, both in terms of its convergence time and in terms of the" quality" of the communities detected.textsc {Max-LPA} is an instance of a class of community detection algorithms calledtextit {label propagation} algorithms. As far as we know, most analysis of label propagation algorithms thus far has been empirical in nature and in this paper we seek a theoretical understanding of label propagation algorithms. In our main result, we define a clustered version ofer random graphs with clusters where the probability , of an edge connecting nodes within a cluster is higher than , the probability of an edge connecting nodes in distinct clusters. We show that even with fairly general restrictions on and ( for any , , where is the number of nodes),textsc {Max-LPA} detects the clusters in just two rounds. Based on this and on empirical results, we conjecture thattextsc {Max-LPA} can correctly and quickly identify communities on clustereder graphs even when the clusters are much sparser, ie, with for some .