o
Overview
o
Invited
Speakers
o
Papers
o
Organizers
o
Sponsor
|
BigMine-13:
2nd International Workshop on Big Data, Streams and Heterogeneous Source
Mining: Algorithms, Systems, Programming Models and Applications
|
Overview
Recent years have witnessed a dramatic increase
in our ability to collect data from various sensors, devices, in different
formats, from independent or connected applications. This data flood has
outpaced our capability to process, analyze, store and understand these
datasets. Consider the Internet data. The web pages indexed by Google were
around one million in 1998, but quickly reached 1 billion in 2000 and have
already exceeded 1 trillion in 2008. This rapid expansion is accelerated by
the dramatic increase in acceptance of social networking applications, such
as Facebook, Twitter, Weibo, etc., that allow
users to create contents freely and amplify the already huge Web volume.
Furthermore, with mobile phones becoming the sensory gateway to get
real-time data on people from different aspects, the vast amount of data
that mobile carrier can potentially process to improve our daily life has
significantly outpaced our past CDR (call data record)-based processing for
billing purposes only. It can be foreseen that Internet of things (IoT) applications will raise the scale of data to an
unprecedented level. People and devices (from home coffee machines to cars,
to buses, railway stations and airports) are all loosely connected.
Trillions of such connected components will generate a huge data ocean, and
valuable information must be discovered from the data to help improve
quality of life and make our world a better place. For example, after we
get up every morning, in order to optimize our commute time to work and
complete the optimization before we arrive at office, the system needs to
process information from traffic, weather, construction, police
activities to our calendar schedules, and perform deep optimization under
the tight time constraints. In all these applications, we are facing
significant challenges in leveraging the vast amount of data, including
challenges in (1) system capabilities (2) algorithmic design (3) business
models.
This workshop aims to bring together people
from both academia and industry to present their most recent work related
to these big-data issues, and exchange ideas and thoughts in order to
advance this big-data challenge, which has been considered as one of the
most exciting opportunities in the past 10 years.
Wei
Fan, Albert Bifet, Qiang
Yang and Philip Yu
BigMine
2013 Program co-Chairs
http://bigdata-mining.org/
|
|
|
|
Invited Speakers Return to Top
Christos Faloutsos, Professor at
Carnegie Mellon University.
Title:
Large Graph Mining – Patterns, tools and cascade analysis
Abstract: What do graphs look like? How do they
evolve over time? How does influence/news/viruses propagate, over time? We
present a long list of static and temporal laws, and some recent
observations on real graphs. For tools, we present an overview of the
PEGASUS system which is designed for handling
Billion-node graphs, running on top of the “hadoop”
system. Finally, for cascades and propagation, we show how to measure the
connectivity of a graph, and how to achieve near-optimal immunization, to
slow down virus propagation.
Jiawei Han, Abel Bliss
Professor of Computer Science, University of
Illinois at
Urbana-Champaign.
Title:
Challenging Problems for Scalable Mining of Heterogeneous Social and
Information Networks
Abstract: In today’s interconnected real world,
social and informational entities are interconnected, forming gigantic,
interconnected, integrated social and information networks. By structuring
these data objects into multiple types, such networks become semi-structured
heterogeneous social and information networks. Most real world applications
that handle big data, including interconnected social media and social
networks, medical information systems, online e-commerce systems, or
database systems, can be structured into typed, heterogeneous social and
information networks. For example, in a medical care network, objects of
multiple types, such as patients, doctors, diseases, medication, and links
such as visits, diagnosis, and treatments are intertwined together, providing
rich information and forming heterogeneous information networks. Effective
analysis of large-scale heterogeneous social and information networks poses
an interesting but critical challenge.
In
this talk, we present a set of data mining scenarios in heterogeneous
social and information networks and show that mining typed, heterogeneous networks is a new and promising research
frontier in data mining research. However, such mining may raise some
serious challenging problems on scalability computation. We identify a set
of problems on scalable computation and calls for serious studies on such
problems. This includes how to efficiently computation for (1) meta
path-based similarity search, (2) rank-based clustering, (3) rank-based
classification, (4) meta path-based link/relationship prediction, and (5)
topical hierarchies from heterogeneous information networks. We introduce
some recent efforts, discuss the trade-offs between query-independent
pre-computation vs. query-dependent online computation, and point out some
promising research directions.
Hong Cheng, Assistant Professor at the Chinese University of
Hong
Kong.
Title:
Processing Reachability Queries with Realistic Constraints on Massive
Networks
Abstract: Massive graphs are ubiquitous in various
application domains, such as social networks, road networks, communication
networks, biological networks, RDF graphs, and so on. Such graphs are
massive (for example, with hundreds of millions of nodes and edges or even
more) and contain rich information (for example, node/edge weights, labels
and textual contents). In such massive graphs, an important class of
problems is to process various graph structure related queries. Graph
reachability, as an example, asks whether a node can reach another in a
graph. However, the large graph scale presents new challenges for efficient
query processing.
In
this talk, I will introduce two new yet important types of graph
reachability queries: weight constraint reachability that imposes edge
weight constraint on the answer path, and k-hop reachability that imposes a
length constraint on the answer path. With such realistic constraints, we
can find more meaningful and practically feasible answers. These two reachablity queries have wide applications in many
real-world problems, such as QoS routing and trip
planning.
Xavier Amatriain, Director of
Personalization Science and Engineering, Netflix.
Title:
Big & Personal: the data and the models behind Netflix
Abstract: Since the Netflix $1 million Prize, announced in 2006,
our company has been known for having personalization at the core of our
product. Even at that point in time, the dataset that we released was
considered “large”, and we stirred innovation in the (Big) Data Mining
research field. Our current product offering is now focused around instant
video streaming, and our data is now many orders of magnitude larger. Not only
do we have many more users in many more countries, but we also receive many
more streams of data. Besides the ratings, we now also use information such
as what our members play, browse, or search.
In this talk I
will discuss the different approaches we follow to deal with these large
streams of data in order to extract information for personalizing our
service. I will describe some of the machine learning models used, as well
as the architectures that allow us to combine complex offline batch
processes with real-time data streams.
|
|
|
|
Table of Contents Return to Top
Full Papers
- Invited
Talk: Big & Personal: data and models behind Netflix
recommendations
Xavier Amatriain,
Netflix
- Soft-CsGDT: Soft Cost-sensitive Gaussian Decision Tree
for Cost-sensitive Classification of Data Streams
Ning Guo,
Yanhua Yu, Meina
Song, Junde Song and Yu Fu.
- Searching
time series with Hadoop in an electric power
company
Alice Bérard and Georges Hebrail.
- Long-memory
time series ensembles for concept shift detection
Marcelo Mendoza, Felipe Bravo-Márquez,
Bárbara Poblete and Daniel Gayo-Avello.
- Estimating
Building Simulation Parameters via Bayesian Structure Learning
Richard Edwards, Joshua New and Lynne Parker.
- Solving
Combinatorial Optimization Problems using Relaxed Linear Programming:
A High Performance Computing Perspective
Chen Jin, Qiang Fu, Huahua
Wang, Ankit Agrawal,
William Hendrix, Wei-Keng Liao, Mostofa Patwary, Arindam Banerjee and Alok
Choudhary.
- CAPRI:
A Tool for Mining Complex Line Patterns in Large Log Data
Farhana Zulkernine,
Patrick Martin, Wendy Powley, Sima Soltani, Serge Mankovksi and Mark Addleman.
- Direct
Out-of-Memory Distributed Parallel Frequent Pattern Mining
Zheyi Rong
and Jeroen De Knijf.
- TV
Predictor: Personalized Program Recommendations to be displayed on SmartTVs
Christopher Krauss, Lars George and Stefan Arbanowski.
- Data-driven
Study of Urban Infrastructure to Enable City-wide
Ubiquitous Computing
Gautam S. Thakur, Pan Hui and Ahmed Helmy.
- Pushing
Constraints into Data Streams
Andreia Silva and Claudia Antunes.
- Forecasting
Building Occupancy Using Sensor Network Data
James Howard and William Hoff.
- Maintaining
connected components for infinite graph streams
Jonathan Berry, Matthew Oster, Cynthia Phillips, Steven Plimpton
and Timothy Shead.
- An
Architecture for Detecting Events in Real-Time using Massive
Heterogeneous Data Sources
George Valkanas, Dimitrios Gunopulos, Ioannis Boutsis and Vana Kalogeraki.
|
|
Organizers Return to Top
Workshop Chairs
Wei Fan
Huawei Noah’s Ark Lab
E-mail: wei.fan at gmail.com
Albert Bifet
Yahoo! Research Barcelona
E-mail: abifet at cs.waikato.ac.nz
Qiang Yang
Huawei Noah’s Ark Lab
E-mail: qiang.yang at huawei.com
Philip Yu
University of Illinois at Chicago
E-mail: psyu at cs.uic.edu
Organizers
- Albert
Bifet, Yahoo! Research Barcelona
- Wei
Fan, Huawei Noah’s Ark Lab
- Jing
Gao, University at Buffalo
- Le
Gruenwald, National Science Foundation,
University of Oklahoma
- Dimitrios Gunopulos, University of Athens
- Geoff
Holmes, University of Waikato
- Latifur Khan,
University of Texas at Dallas
- Dekang Lin, Google
- Deepak
Turaga, IBM T.J. Watson Research
- Qiang
Yang, Huawei Noah’s Ark Lab
- Philip
Yu, University of Illinois at Chicago
- Kun
Zhang, Xavier University of Louisiana
- Xiatian Zhang, Tencent
- Yuanchun Zhou,
Chinese Academy of Sciences
Poster and Reception Chairs
- Xia
Tian Zhang, Tencent
- Xian
Wu, Microsoft
Publicity Chairs
- Albert
Bifet, Yahoo! Research Barcelona
- Erheng Zhong, Hong University of Science of Technology
Treasury
- Xiaoxiao Shi,
University of Illinois at Chicago
- Jing
Gao, SUNY Buffalo
Program Committee
- Vassilis Athitsos, University of Texas at Arlington
- Roberto
Bayardo, Google
- Francesco
Bonchi, Yahoo! Research Barcelona
- Liangliang Cao, IBM
- Hong
Cheng, The Chinese University of Hong Kong
- Alfredo
Cuzzocrea, ICAR-CNR & University of
Calabria
- Ian
Davidson, SUNY
- Nan
Du, Georgia Institute of Technology
- Joao
Gama, University Porto
- Fosca Giannotti, ISTI-CNR
- Aristides
Gionis, Yahoo! Research Barcelona
- Bart
Goethals, University of Antwerp
- Jiawei Han,
University of Illinois at Urbana-Champaign
- Marwan
Hassani, Aachen University
- Steven
C.H. Hoi, Nanyang Technological University
- Siddhartha
Jonnalagadda, Mayo Clinic
- Murat
Kantarcioglu, University of Texas at Dallas
- George
Karypis, University of Minnesota
- Steve
Ko, SUNY at Buffalo
- Vipin Kumar,
University of Minnesota, Twin Cities
- Jianhui Li,
Computer Network Information Center,Chinese Academy of Sciences
- Cindy
Xide Lin, University of Illinois at
Urbana-Champaign
- Shou-De Lin,
National Taiwan University
- Michael
May, Fraunhofer IAIS
- Themis
Palpanas, University of Trento
- Bernhard
Pfahringer, University of Waikato
- Jesse
Read, Universidad Carlos III
- Chandan K. Reddy,
Wayne State University
- Cyrus
Shahabi, USC
- Ashok
Srivastava, NASA
- Jian-Tao Sun,
Microsoft Research Asia
- Jie Tang,
Tsinghua University
- Hanghang Tong,
Carnegie Mellon University
- Haifeng Wang, Baidu
- Bo
Wang, Nanjing University of Aeronautics & Astronautics
- Yi
Wang, Tencent
- Xian
Wu, Microsoft
- Tian Wu, Baidu
- Zhenghua Xue, Chinese Academy of Sciences
- Gui-Rong Xue, Shanghai Jiao Tong University
- Xifeng Yan, University
of California at Santa Barbara
- Rong Yan,
Facebook
- Aden
Yuen, Tencent
- Demetris Zeinalipour, University of Cyprus
- Xingquan Zhu,
University of Technology, Sydney
|
|