The Interactive Data Exploration and Analytics (IDEA) workshop addresses the development of data mining techniques that allow users to interactively explore their data. We focus and emphasize on interactivity and effective integration of techniques from data mining, visualization and human-computer interaction (HCI). In other words, we explore how the best of these different but related domains can be combined such that the sum is greater than the parts.


Program & Attending IDEA

IDEA will be a full-day workshop on Sunday, Aug 11, at ACM SIGKDD 2013 in Chicago.

9:00 Welcome
9:10 Keynote 1
Prof. Haesun Park
Georgia Tech
School of Computational Science & Engineering
Interactive Visual Analytics for High Dimensional Data

Prof. Haesun Park received her B.S. degree in Mathematics from Seoul National University, Seoul Korea, in 1981 with summa cum laude and the University President's Medal for the top graduate, and her M.S. and Ph.D. degrees in Computer Science from Cornell University, Ithaca, NY, in 1985 and 1987, respectively. She has been a professor in the School of Computational Science and Engineering at the Georgia Institute of Technology, Atlanta, Georgia since 2005. Before joining Georgia Tech, she was on faculty at University of Minnesota, Twin Cities, and program director at the National Science Foundation, Arlington, VA. She has published extensively in the areas including numerical algorithms, data analysis, visual analytics, text mining, and parallel computing. She has been the director of the NSF/DHS FODAVA-Lead (Foundations of Data and Visual Analytics) center and executive director of Center for Data Analytics at Georgia Tech. She has served on numerous editorial boards including IEEE Transactions on Pattern Analysis and Machine Intelligence, SIAM Journal on Matrix Analysis and Applications, SIAM Journal on Scientific Computing, and has served as a conference co-chair for SIAM International Conference on Data Mining in 2008 and 2009. In 2013, she was elected as a SIAM Fellow.
Many modern data sets can be represented in high dimensional vector spaces and have benefited from computational methods that utilize advanced techniques from numerical linear algebra and optimization. Visual analytics approaches have contributed greatly to data understanding and analysis due to utilization of both automated algorithms and human’s quick visual perception and interaction. However, visual analytics targeting high dimensional large-scale data has been challenging due to low dimensional screen space with limited pixels to represent data. Among various computational techniques supporting visual analytics, dimension reduction and clustering have played essential roles by reducing the dimension and volume to visually manageable scales.

In this talk, we present some of the key foundational methods for supervised dimension reduction such as linear discriminant analysis (LDA), dimension reduction and clustering/topic discovery by nonnegative matrix factorization (NMF), and visual spatial alignment for effective fusion and comparisons by Orthogonal Procrustes. We demonstrate how these methods can effectively support interactive visual analytic tasks that involve large-scale document and image data sets.
10:00 Coffee
10:30 Talks (time allocation: 20, 20, 20, 15, 15)
Zips: Mining Compressing Sequential Patterns in Streams
Hoang Thanh Lam, Toon Calders, Jie Yang, Fabian Moerchen and Dmitriy Fradkin
Methods for Exploring and Mining Tables on Wikipedia
Chandra Sekhar Bhagavatula, Thanapon Noraset and Doug Downey
One Click Mining—Interactive Local Pattern Discovery through Implicit Preference and Performance Learning
Mario Boley, Bo Kang, Pavel Tokmakov, Michael Mampaey and Stefan Wrobel
Online Spatial Data Analysis and Visualization System
Yun Lu, Mingjin Zhang, Tao Li, Yudong Guang and Naphtali Rishe
Building Blocks for Exploratory Data Analysis Tools
Sara Alspaugh, Archana Ganapathi, Marti Hearst and Randy Katz
12:00 Lunch
2:00 Re-welcome
2:10 Keynote 2
Prof. Marti Hearst
UC Berkeley
School of Information

3:00 Talks (time allocation: 15, 15)
A Process-Centric Data Mining and Visual Analytic Tool for Exploring Complex Social Networks
Denis Dimitrov, Lisa Singh and Janet Mann
Lytic: Synthesizing High-Dimensional Algorithmic Analysis with Domain-Agnostic, Faceted Visual Analytics
Edward Clarkson, Jaegul Choo, John Turgeson, Ray Decuir and Haesun Park
3:30 Coffee
4:00 Talks (time allocation: 15, 15, 15, 15)
Towards Anytime Active Learning: Interrupting Experts to Reduce Annotation Costs
Maria Ramirez, Aron Culotta and Mustafa Bilgic
Randomly Sampling Maximal Itemsets
Sandy Moens and Bart Goethals
Augmenting MATLAB with Semantic Objects for an Interactive Visual Environment
Changhyun Lee, Jaegul Choo, Haesun Park and Duen Horng Chau
Storygraphs: Extracting patterns from spatio-temporal data
Ayush Shrestha, Ying Zhu, Ben Miller and Yi Zhao
5:00 Closing

Two useful links:

Keynotes

Prof. Marti Hearst
UC Berkeley
School of Information
Prof. Haesun Park
Georgia Tech
School of Computational Science & Engineering
Interactive Visual Analytics for High Dimensional Data
Prof. Haesun Park received her B.S. degree in Mathematics from Seoul National University, Seoul Korea, in 1981 with summa cum laude and the University President's Medal for the top graduate, and her M.S. and Ph.D. degrees in Computer Science from Cornell University, Ithaca, NY, in 1985 and 1987, respectively. She has been a professor in the School of Computational Science and Engineering at the Georgia Institute of Technology, Atlanta, Georgia since 2005. Before joining Georgia Tech, she was on faculty at University of Minnesota, Twin Cities, and program director at the National Science Foundation, Arlington, VA. She has published extensively in the areas including numerical algorithms, data analysis, visual analytics, text mining, and parallel computing. She has been the director of the NSF/DHS FODAVA-Lead (Foundations of Data and Visual Analytics) center and executive director of Center for Data Analytics at Georgia Tech. She has served on numerous editorial boards including IEEE Transactions on Pattern Analysis and Machine Intelligence, SIAM Journal on Matrix Analysis and Applications, SIAM Journal on Scientific Computing, and has served as a conference co-chair for SIAM International Conference on Data Mining in 2008 and 2009. In 2013, she was elected as a SIAM Fellow.
Many modern data sets can be represented in high dimensional vector spaces and have benefited from computational methods that utilize advanced techniques from numerical linear algebra and optimization. Visual analytics approaches have contributed greatly to data understanding and analysis due to utilization of both automated algorithms and human’s quick visual perception and interaction. However, visual analytics targeting high dimensional large-scale data has been challenging due to low dimensional screen space with limited pixels to represent data. Among various computational techniques supporting visual analytics, dimension reduction and clustering have played essential roles by reducing the dimension and volume to visually manageable scales.

In this talk, we present some of the key foundational methods for supervised dimension reduction such as linear discriminant analysis (LDA), dimension reduction and clustering/topic discovery by nonnegative matrix factorization (NMF), and visual spatial alignment for effective fusion and comparisons by Orthogonal Procrustes. We demonstrate how these methods can effectively support interactive visual analytic tasks that involve large-scale document and image data sets.

Organizers

Polo Chau
Georgia Tech
Jilles Vreeken
Universiteit Antwerpen
Matthijs van Leeuwen
KU Leuven
Christos Faloutsos
Carnegie Mellon
Contact us at:
idea13kdd (at) gmail.com

Poster


Program Committee

Adam Perer (IBM, USA)
Albert Bifet (Yahoo! Labs, Barcelona, Spain)
Aris Gionis (Aalto University, Finland)
Arno Knobbe (U Leiden)
Chris Johnson (University of Utah, USA)
Cody Dunne (UMD, USA)
David Gotz (IBM, USA)
Geoff Webb (Monash University, Australia)
George Forman (HP Labs)
Hanghang Tong (City University of New York)
Jaakko Hollmen (Aalto University, Finland)
Jacob Eisenstein (Georgia Tech)
Jaegul Choo (Georgia Tech)
Jiawei Han (University of Illinois at Urbana-Champaign)
Jimeng Sun (IBM, USA)
John Stasko (Georgia Tech)
Kai Puolamäki (Finnish Institute of Occupational Health, Finland)
Katharina Morik (TU.Dortmund)
Kayur Patel (Google)
Leman Akoglu (Stony Brook University)
Mario Boley (Fraunhofer IAIS, University of Bonn)
Marti Hearst (UC Berkeley, USA)
Martin Theobald (University of Antwerp, Belgium)
Nan Cao (IBM, USA)
Naren Ramakrishnan (Virginia Tech, USA)
Nikolaj Tatti (Aalto University, Finland)
Parikshit Ram (Georgia Tech, USA)
Pauli Mietinnen (Max Planck Institute for Informatics, Germany)
Saleema Amershi (Microsoft Research)
Tijl De Bie (University of Bristol, UK)
Tim (Jia-Yu) Pan (Google)
Tina Eliassi-Rad (Rutgers)
Tino Weinkauf (Max Planck Institute for Informatics, Germany)
Toon Calders (Université Libre de Bruxelles, Belgium)
Zhicheng 'Leo' Liu (Stanford)

What's the IDEA?

We have entered the era of big data. Massive datasets, surpassing terabytes and petabytes, are now commonplace. They arise in numerous settings in science, government, and enterprises. Today, technology exists by which we can collect and store such massive amounts of information. Yet, making sense of these data remains a fundamental challenge. We lack the means to exploratively analyze databases of this scale. Currently, few technologies allow us to freely "wander" around the data, and make discoveries by following our intuition, or serendipity. While standard data mining aims at finding highly interesting results, it is typically computationally demanding and time consuming, thus may not be well-suited for interactive exploration of large datasets.

Interactive data mining techniques that aptly integrate human intuition, by means of visualization and intuitive human-computer interaction (HCI) techniques, and machine computation support have been shown to help people gain significant insights into a wide range of problems. However, as datasets are being generated in larger volumes, higher velocity, and greater variety, creating effective interactive data mining techniques becomes a much harder task.

Our focus and emphasis is on interactivity and effective integration of techniques from data mining, visualization and human-computer interaction. In other words, we intend to explore how the best of these different but related domains can be combined such that the sum is greater than the parts.