4th MultiClust Workshop on Multiple Clusterings, Multi-view Data, and Multi-source Knowledge-driven Clustering

4th MultiClust Workshop on
Multiple Clusterings, Multi-view Data, and Multi-source Knowledge-driven Clustering

Overview by Organizers

Multiple views and data sources require clustering techniques capable of providing several distinct analyses of the data. The cross-disciplinary research topic on multiple clustering has thus received significant attention in recent years. However, since it is relatively young, important research challenges remain. Specifically, we observe an emerging interest in discovering multiple clustering solutions from very high dimensional and complex databases. Detecting alternatives while avoiding redundancy is a key challenge for multiple clustering solutions. Toward this goal, important research issues include: how to define redundancy among clusterings; whether existing algorithms can be modified to accommodate the finding of multiple solutions; how many solutions should be extracted; how to select among far too many possible solutions; how to evaluate and visualize results; how to most effectively help the data analysts in finding what they are looking for. Recent work tackles this problem by looking for non-redundant, alternative, disparate or orthogonal clusterings. Research in this area benefits from well-established related areas, such as ensemble clustering, constraint-based clustering, frequent pattern mining, theory on result summarization, consensus mining, and general techniques coping with complex and high dimensional databases. At the same time, the topic of multiple clustering solutions has opened novel challenges in these research fields.

Overall, this cross-disciplinary research endeavor has recently received significant attention from multiple communities. In this workshop, we plan to bring together researchers from the above research areas to discuss issues in multiple clustering discovery. We solicit approaches for solving emerging issues in the areas of clustering ensembles, semi-supervised clustering, subspace/projected clustering, co-clustering, and multi-view clustering. Of particular interest will be papers that draw new and insightful connections between these areas, and papers that contribute to the achievement of a unified framework that combines two or more of these problems.

Schedule Return to Top

Workshop Schedule at a Glance
August 11, 2013 Sunday (2-5 p.m.)
2:00-2:30	Opening ceremony
2:00-2:30	Invited Talk: A theoretical approach to the clustering selection problem Shai Ben-David
2:30-3:30	Session 1 Stochastic Subspace Search for Top-K Multi-View Clustering Geng Li, Stephan Günnemann, Mohammed J. Zaki Probabilistic Non-linear Distance Metric Learning For Constrained Clustering Behnam Babagholami-Mohamadabadi, Ali Zarghami, Hojjat Abdollahi, Mohammad T. Manzuri-Shalmani Variational Bayes Co-clustering with Auxiliary Information Motoki Shiga, Hiroshi Mamitsuka
3:30-4:00	Coffee break
4:00-4:25
4:00-4:25	Invited Talk: Parallel Universes Michael Berthold
4:25-4:55	Session 2 Absolute and Relative Clustering Toshihiro Kamishima, Shotaro Akaho Spectral Graph Multisection Through Orthogonality Huanyang Zheng, Jie Wu
4:55-5:00	Wrap-up

Invited Speaker Return to Top

Shai Ben-David, Waterloo University, Canada

A theoretical approach to the clustering selection problem

Abstract: Clustering is a basic data mining task with a wide variety of applications. Not surprisingly, there exist many clustering algorithms. However, clustering is an ill defined problem - given a data set, it is not clear what a "correct" clustering for that set is. Indeed, different algorithms may yield dramatically different outputs for the same input sets. In contrast with other common learning tasks, like classification prediction, clustering does not have a well defined ground truth. Faced with a concrete clustering task, a user needs to choose an appropriate clustering algorithm (as well as a concrete setting for the tunable parameters of the chosen algorithm). Currently, such decisions are often made in a very ad hoc, if not completely random, manner. Given the crucial effect of the choice of a clustering algorithm on the resulting clustering, this state of affairs is truly regrettable. Can the research community develop effective tools for helping users make informed decisions when they come to pick a clustering tool for their data? How can we help the data analysts in nding the cluster structures are looking for?
Several paradigms have been proposed to answer that challenge. These include, semi-supervised clustering (in which the user specifies partial information about the desired clustering solution in the form of link/don't link examples) and multiple clusterings, as well as tools for visualization of clusterings. In this work, we propose a high-level approach to this challenge. The basic premise of my work is that prior domain knowledge is an indispensable component of any successful cluttering paradigm. In light of this, a major research objective is the development tools for communicating relevant prior knowledge between the data analysts, that has some understanding or intuition about the task at hand, and the tools for choosing clusterings (or clustering algorithms).
We address this objective by proposing two approaches. First, we address the choice of clustering algorithm. Our paradigm is to distill abstract properties of the input-output behaviors of different clustering paradigms. The goal is to come up with a list of such properties so that these properties can capture some of the domain knowledge that users have about their tasks, while being strong enough to distinguish between dierent clustering algorithms. We introduce several abstract properties of clustering functions and use them to taxonomize clustering algorithmic paradigms. Secondly, we consider requirements for dening the quality of given clusterings. Such clustering quality measures can be viewed as another way for expressing prior domain knowledge.

Invited Speaker Return to Top

Michael Berthold, University of Konstanz, Germany

Parallel Universes

Table of Contents Return to Top

Full Papers

Stochastic Subspace Search for Top-K Multi-View Clustering
Geng Li (Rensselaer Polytechnic Institute, USA)
Stephan Günnemann (Carnegie Mellon University, USA)
Mohammed J. Zaki (Rensselaer Polytechnic Institute, USA)

Probabilistic Non-linear Distance Metric Learning For Constrained Clustering
Behnam Babagholami-Mohamadabadi (Sharif University of Technology, Iran)
Ali Zarghami (Sharif University of Technology, Iran)
Hojjat Abdollahi (Sharif University of Technology, Iran)
Mohammad T. Manzuri-Shalmani (Sharif University of Technology, Iran)

Variational Bayes Co-clustering with Auxiliary Information
Motoki Shiga (Toyohashi University of Technology, Japan)
Hiroshi Mamitsuka (Kyoto University, Japan)

Absolute and Relative Clustering
Toshihiro Kamishima (National Institute of Advanced Industrial Science and Technology, Japan)
Shotaro Akaho (National Institute of Advanced Industrial Science and Technology, Japan)

Short Paper

Spectral Graph Multisection Through Orthogonality
Huanyang Zheng (Temple University, USA)
Jie Wu (Temple University, USA)

Organizers Return to Top

Organizing Committee

Ira Assent (Aarhus University, Denmark)
Carlotta Domeniconi (George Mason University, USA)
Francesco Gullo (Yahoo! Research, Spain)
Andrea Tagarelli (University of Calabria, Italy)
Arthur Zimek (Ludwig-Maximilians-Universität München, Germany)

Program Committee

James Bailey (University of Melbourne, Australia)
Ricardo J. G. B. Campello (University of São Paulo, Brazil)
Xuan-Hong Dang (Aarhus University, Denmark)
Ines Färber (RWTH Aachen University, Germany)
Wei Fan (IBM T. J. Watson Research Center and IBM CRL, USA)
Ana Fred (Technical University of Lisbon, Portugal)
Stephan Günnemann (Carnegie Mellon University, USA)
Dimitrios Gunopulos (University of Athens, Greece)
Michael E. Houle (National Institute of Informatics, Japan)
Emmanuel Müller (Karlsruhe Institute of Technology, Germany)
Erich Schubert (LMU Munich, Germany)
Thomas Seidl (RWTH Aachen University, Germany)
Grigorios Tsoumakas (Aristotle University of Thessaloniki, Greece)
Giorgio Valentini (University of Milan, Italy)
Jilles Vreeken (University of Antwerp, Belgium)

Sponsors Return to Top