Guest Speaker: Dr. Charalampos Tsourakakis
480 Dreese Labs
2015 Neil Ave
Columbus, OH 43210
Scalable Large Near-Clique Detection in Large-Scale Networks
Finding large near-(bi)cliques in massive networks is a notoriously hard problem of great importance to many applications, including anomaly detection and security, and community detection in social networks, and the Web graph. But can we exploit idiosyncrasies of real-world networks to attack this NP-hard problem successfully?
In this talk I will answer this question in the affirmative. Specifically, I will present a significant advance in routines with rigorous theoretical guarantees for scalable extraction of large near-cliques from large-scale networks, the k-clique densest subgraph problem. I will present exact, approximation, and MapReduce algorithms that allow us to scale our computations to large-scale networks. At the heart of our scalable algorithm lies a densest subgraph sparsifier theorem. Our approach allows us to extract large near-(bi)cliques from a diverse collection of networks with millions of edges within few seconds. Finally, I will present interesting graph mining findings such as anomaly detection in real-world networks. I will conclude my talk with some interesting open problems.
Dr. Charalampos Tsourakakis is currently a CRCS Postdoctoral Fellow in the School of Engineering and Applied Sciences (SEAS) at Harvard University. He received his Ph.D. from the Algorithms, Combinatorics and Optimization (ACO) program at Carnegie Mellon University (CMU). He also holds a Master of Science from the Machine Learning Department at CMU. He did his undergraduate studies in the School of Electrical and Computer Engineering (ECE) at the National Technical University of Athens (NTUA). He is the recipient of a best paper award in IEEE Data Mining and has designed two graph mining libraries for tera-scale graphs. The former has been officially included in Windows Azure, and the latter was a research highlight of Microsoft Research. His main research interest lies in designing scalable and data-aware algorithms for massive networks that recover hidden structure and solve complex data-driven problems.