Un séminaire LIG sur le graph clustering par le Pr. Parthasarathy, un des leaders de la communité américaine de Data Mining, exceptionnellement de passage au LIG.
Dr. Srinivasan Parthasarathy (PhD, University of Rochester), is currently a Full Professor in the Computer Science and Engineering Department at the Ohio State University (OSU). His research interests are broadly in the areas of Data Mining, Databases, Bioinformatics and High Performance Computing. He is a recipient of an NSF CAREER award in 2003, a DOE Early Career Award in 2004, an Ameritech Faculty fellowship in 2001, multiple IBM Faculty Awards (2007,2010) and a Google Research Award in 2009. His papers have received five best paper awards from among nine nominations in leading conferences in the field, including ones at SIAM international conference on data mining (SDM), IEEE international conference on data mining (ICDM), the Very Large Databases Conference (VLDB) and most recently at ACM Knowledge Discovery and Data Mining (SIGKDD). He is on the steering committee of SIAM Data Mining and the organizational committees of other leading conferences in the fields of data mining, databases, and high performance computing. He currently serves on the editorial boards of several journals including the Data Mining and Knowledge Discovery Journal (DMKDJ), ACM Transactions on Knowledge Discovery and Data Mining and the pVLDB review board.
Many real world problems (biological, social, web) can be effectively modeled as networks or graphs where nodes represent entities of interest and edges mimic the interactions or relationships among them. The study of such complex relationship networks, recently referred to as \it network science, can provide insight into their structure, properties and emergent behavior. Of particular interest here are rigorous methods for uncovering and understanding important network structures and motifs (communities) at multiple topological and temporal scales. Achieving this objective is challenging due to the presence of noise (false or missing interactions), topological (scale-free)) properties and scalability. Given the importance of the graph clustering problem, a number of solutions ranging from hierarchical methods to spectral methods have been designed and developed. To this mix of graph clustering algorithms one can also include a recent entrant — Markov Clustering (MCL) — a graph clustering algorithm based on stochastic flow simulation, and the focus of this work.
MCL has several advantages over competing techniques including an inherent simplicity, empirically confirmed robustness to noise effects and relatively non-parametric nature. However, in spite of its popularity in the bioinformatics community MCL has drawn limited attention from the broader social network, web and data mining communities primarily due to its limited scalability. In this talk we present two orthogonal strategies to address this problem - the first relies on developing a new multi-level regularized variant of MCL called MLR-MCL. The second approach focuses on data reduction via graph sparsification, in a manner analogous to the use of sampling, for improving the scalability of such algorithms. Empirical results demonstrate both qualitative as well as quantitive improvements over existing approaches on a wide range of domains.
Acknowledgements : Based on work presented at KDD 2009 and SIGMOD 2011 Joint work with Venu Satuluri and Yiye Ruan.