Syntax:
cc_find nthresh -i in1 -o out1.file out1.mr
Examples:
cc_find 1000 -i mre -o cc.list cc
Description:
This is a named command which finds all the connected components (CCs) in an undirected graph. A connected component is a set of vertices where each is connected to one or more other vertices in the set via an edge. The CCs are found via the MapReduce algorithm of (Cohen) discussed in his paper with extensions described in the paper of (Plimpton) which attempt to load-balance the calculation across processors when one or a few very large components exist in the graph.
See the named command doc page for various ways in which the -i inputs and -o outputs for a named command can be specified.
In1 stores a set of edges, assumed to have no duplicates or self-edges. This means that either (Vi,Vj) or (Vj,Vi) appears, but not both. Also (Vi,Vi) does not appear. The input is unchanged by this command.
Out1 will store the assignment of each vertex in the graph to a component ID. The component ID is the smallest vertex ID of any vertex in the component.
Related commands:
(Cohen) Cohen, "Graph Twiddling in a MapReduce World", Computing in Science and Engineering, 11, 29-41 (2009).
(Plimpton) Plimpton and Devine, "MapReduce in MPI for Large-Scale Graph Algorithms", to appear in Parallel Computing (2011).