In Partial Fulfillment of the Requirements for the Degree of Master of Science
will defend her thesis
PS-MWC: A scalable, distributed maximum weight clique algorithm using Apache Spark
The maximum weight clique problem (MWCP), in which we search for a clique with maximum sum of vertices’ weights, is a long-standing problem in graph theory. It has a wide range of applications in wireless network telecommunications, railroad dispatching, social network analysis and hotspot discovery. The MWCP is a challenging NP-Hard problem and therefore sequential algorithms are not able to solve MWCPs for graphs with millions of nodes and edges. To alleviate this problem, in this thesis, we present a scalable and distributed approach, PS-MWC, to solve the MWCP utilizing Apache Spark, an in-memory cluster computing framework. PS-MWC consists of two phases: a partitioning phase and a solution phase. In the partitioning phase, we partition the original problem into smaller sub-problems in parallel, such that each sub-problem is small enough to be efficiently solved using a sequential MWC algorithm. In the solution phase, we solve the sub-problems in parallel. We apply efficient pruning strategies in both phases, to discard sub-problems if they have no chance to beat the best solution found so far. Finally, we output the best solution found. We evaluate the performance of PS-MWC on a benchmark of MWCPs. Experiments show that PS-MWC finds the same solutions as state-of-the-art algorithms such as FastWClq and WLMC. We also examine the benefits of our partitioning strategy, the relationship between the number of vertices and edges of graph and the runtime, as well as the pruning efficiency. The results show that after partitioning, the size of sub-problems is much less than that of original graph. Moreover, there is an almost linear relationship between the size of original graph and the runtime. High efficiency of the pruning strategy is reflected in the results that more than 99.80% of sub-problems have been pruned. For a graph with more than 3 million vertices and more than 117 million edges, PS-MWC takes 1349 and 654 seconds on an EMR cluster with 64 and 128 cores respectively.
Date: Thursday, June 28, 2018
Time: 3:00 PM
Place: PGH 362
Advisor: Dr. Christoph F. Eick
Faculty, students, and the general public are invited.