CREATOR: gd-jpeg v1.0 (using IJG JPEG v62), quality = 90

o    Overview

o    Invited Speakers

o    Papers

o    Organizers

o    Sponsor

BigMine-13: 2nd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications

Overview

Recent years have witnessed a dramatic increase in our ability to collect data from various sensors, devices, in different formats, from independent or connected applications. This data flood has outpaced our capability to process, analyze, store and understand these datasets. Consider the Internet data. The web pages indexed by Google were around one million in 1998, but quickly reached 1 billion in 2000 and have already exceeded 1 trillion in 2008. This rapid expansion is accelerated by the dramatic increase in acceptance of social networking applications, such as Facebook, Twitter, Weibo, etc., that allow users to create contents freely and amplify the already huge Web volume. Furthermore, with mobile phones becoming the sensory gateway to get real-time data on people from different aspects, the vast amount of data that mobile carrier can potentially process to improve our daily life has significantly outpaced our past CDR (call data record)-based processing for billing purposes only. It can be foreseen that Internet of things (IoT) applications will raise the scale of data to an unprecedented level. People and devices (from home coffee machines to cars, to buses, railway stations and airports) are all loosely connected. Trillions of such connected components will generate a huge data ocean, and valuable information must be discovered from the data to help improve quality of life and make our world a better place. For example, after we get up every morning, in order to optimize our commute time to work and complete the optimization before we arrive at office, the system needs to process information from traffic, weather, construction, police activities to our calendar schedules, and perform deep optimization under the tight time constraints. In all these applications, we are facing significant challenges in leveraging the vast amount of data, including challenges in (1) system capabilities (2) algorithmic design (3) business models.

This workshop aims to bring together people from both academia and industry to present their most recent work related to these big-data issues, and exchange ideas and thoughts in order to advance this big-data challenge, which has been considered as one of the most exciting opportunities in the past 10 years.

Wei Fan, Albert Bifet, Qiang Yang and Philip Yu

BigMine 2013 Program co-Chairs

http://bigdata-mining.org/


Invited Speakers Return to Top


 

Christos Faloutsos, Professor at Carnegie Mellon University.

Title: Large Graph Mining – Patterns, tools and cascade analysis

Abstract: What do graphs look like? How do they evolve over time? How does influence/news/viruses propagate, over time? We present a long list of static and temporal laws, and some recent observations on real graphs. For tools, we present an overview of the PEGASUS system which is designed for handling Billion-node graphs, running on top of the “hadoop” system. Finally, for cascades and propagation, we show how to measure the connectivity of a graph, and how to achieve near-optimal immunization, to slow down virus propagation.

 

Jiawei Han, Abel Bliss Professor of Computer Science, University of 
Illinois at Urbana-Champaign.

Title: Challenging Problems for Scalable Mining of Heterogeneous Social and Information Networks

Abstract: In today’s interconnected real world, social and informational entities are interconnected, forming gigantic, interconnected, integrated social and information networks. By structuring these data objects into multiple types, such networks become semi-structured heterogeneous social and information networks. Most real world applications that handle big data, including interconnected social media and social networks, medical information systems, online e-commerce systems, or database systems, can be structured into typed, heterogeneous social and information networks. For example, in a medical care network, objects of multiple types, such as patients, doctors, diseases, medication, and links such as visits, diagnosis, and treatments are intertwined together, providing rich information and forming heterogeneous information networks. Effective analysis of large-scale heterogeneous social and information networks poses an interesting but critical challenge.

 

In this talk, we present a set of data mining scenarios in heterogeneous social and information networks and show that mining typed, heterogeneous networks is a new and promising research frontier in data mining research. However, such mining may raise some serious challenging problems on scalability computation. We identify a set of problems on scalable computation and calls for serious studies on such problems. This includes how to efficiently computation for (1) meta path-based similarity search, (2) rank-based clustering, (3) rank-based classification, (4) meta path-based link/relationship prediction, and (5) topical hierarchies from heterogeneous information networks. We introduce some recent efforts, discuss the trade-offs between query-independent pre-computation vs. query-dependent online computation, and point out some promising research directions.

 

Hong Cheng, Assistant Professor at the Chinese University of Hong 
Kong.

Title: Processing Reachability Queries with Realistic Constraints on Massive Networks

Abstract: Massive graphs are ubiquitous in various application domains, such as social networks, road networks, communication networks, biological networks, RDF graphs, and so on. Such graphs are massive (for example, with hundreds of millions of nodes and edges or even more) and contain rich information (for example, node/edge weights, labels and textual contents). In such massive graphs, an important class of problems is to process various graph structure related queries. Graph reachability, as an example, asks whether a node can reach another in a graph. However, the large graph scale presents new challenges for efficient query processing.

 

In this talk, I will introduce two new yet important types of graph reachability queries: weight constraint reachability that imposes edge weight constraint on the answer path, and k-hop reachability that imposes a length constraint on the answer path. With such realistic constraints, we can find more meaningful and practically feasible answers. These two reachablity queries have wide applications in many real-world problems, such as QoS routing and trip planning.

 

Xavier Amatriain, Director of Personalization Science and Engineering, Netflix.


Title: Big & Personal: the data and the models behind Netflix  

Abstract: Since the Netflix $1 million Prize, announced in 2006, our company has been known for having personalization at the core of our product. Even at that point in time, the dataset that we released was considered “large”, and we stirred innovation in the (Big) Data Mining research field. Our current product offering is now focused around instant video streaming, and our data is now many orders of magnitude larger. Not only do we have many more users in many more countries, but we also receive many more streams of data. Besides the ratings, we now also use information such as what our members play, browse, or search.

In this talk I will discuss the different approaches we follow to deal with these large streams of data in order to extract information for personalizing our service. I will describe some of the machine learning models used, as well as the architectures that allow us to combine complex offline batch processes with real-time data streams.


Table of Contents Return to Top


Full Papers


Organizers Return to Top


Workshop Chairs

 

Wei Fan
Huawei Noah’s Ark Lab
E-mail: wei.fan at gmail.com

 

Albert Bifet
Yahoo! Research Barcelona
E-mail: abifet at cs.waikato.ac.nz

 

Qiang Yang
Huawei Noah’s Ark Lab
E-mail: qiang.yang at huawei.com

 

Philip Yu
University of Illinois at Chicago
E-mail: psyu at cs.uic.edu

 

Organizers

  • Albert Bifet, Yahoo! Research Barcelona
  • Wei Fan, Huawei Noah’s Ark Lab
  • Jing Gao, University at Buffalo
  • Le Gruenwald, National Science Foundation, University of Oklahoma
  • Dimitrios Gunopulos, University of Athens
  • Geoff Holmes, University of Waikato
  • Latifur Khan, University of Texas at Dallas
  • Dekang Lin, Google
  • Deepak Turaga, IBM T.J. Watson Research
  • Qiang Yang, Huawei Noah’s Ark Lab
  • Philip Yu, University of Illinois at Chicago
  • Kun Zhang, Xavier University of Louisiana
  • Xiatian Zhang, Tencent
  • Yuanchun Zhou, Chinese Academy of Sciences

 

Poster and Reception Chairs

  • Xia Tian Zhang, Tencent
  • Xian Wu, Microsoft

 

Publicity Chairs

  • Albert Bifet, Yahoo! Research Barcelona
  • Erheng Zhong, Hong University of Science of Technology

 

Treasury

  • Xiaoxiao Shi, University of Illinois at Chicago
  • Jing Gao, SUNY Buffalo

 

Program Committee

  • Vassilis Athitsos, University of Texas at Arlington
  • Roberto Bayardo, Google
  • Francesco Bonchi, Yahoo! Research Barcelona
  • Liangliang Cao, IBM
  • Hong Cheng, The Chinese University of Hong Kong
  • Alfredo Cuzzocrea, ICAR-CNR & University of Calabria
  • Ian Davidson, SUNY
  • Nan Du, Georgia Institute of Technology
  • Joao Gama, University Porto
  • Fosca Giannotti, ISTI-CNR
  • Aristides Gionis, Yahoo! Research Barcelona
  • Bart Goethals, University of Antwerp
  • Jiawei Han, University of Illinois at Urbana-Champaign
  • Marwan Hassani, Aachen University
  • Steven C.H. Hoi, Nanyang Technological University
  • Siddhartha Jonnalagadda, Mayo Clinic
  • Murat Kantarcioglu, University of Texas at Dallas
  • George Karypis, University of Minnesota
  • Steve Ko, SUNY at Buffalo
  • Vipin Kumar, University of Minnesota, Twin Cities
  • Jianhui Li, Computer Network Information Center,Chinese Academy of Sciences
  • Cindy Xide Lin, University of Illinois at Urbana-Champaign
  • Shou-De Lin, National Taiwan University
  • Michael May, Fraunhofer IAIS
  • Themis Palpanas, University of Trento
  • Bernhard Pfahringer, University of Waikato
  • Jesse Read, Universidad Carlos III
  • Chandan K. Reddy, Wayne State University
  • Cyrus Shahabi, USC
  • Ashok Srivastava, NASA
  • Jian-Tao Sun, Microsoft Research Asia
  • Jie Tang, Tsinghua University
  • Hanghang Tong, Carnegie Mellon University
  • Haifeng Wang, Baidu
  • Bo Wang, Nanjing University of Aeronautics & Astronautics
  • Yi Wang, Tencent
  • Xian Wu, Microsoft
  • Tian Wu, Baidu
  • Zhenghua Xue, Chinese Academy of Sciences
  • Gui-Rong Xue, Shanghai Jiao Tong University
  • Xifeng Yan, University of California at Santa Barbara
  • Rong Yan, Facebook
  • Aden Yuen, Tencent
  • Demetris Zeinalipour, University of Cyprus
  • Xingquan Zhu, University of Technology, Sydney

 

 

 

 


Sponsors Return to Top


 

Description: etflix