BIGMINE 2013 Workshop Proceedings

CREATOR: gd-jpeg v1.0 (using IJG JPEG v62), quality = 90

BigMine-13: 2nd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications

	Overview Recent years have witnessed a dramatic increase in our ability to collect data from various sensors, devices, in different formats, from independent or connected applications. This data flood has outpaced our capability to process, analyze, store and understand these datasets. Consider the Internet data. The web pages indexed by Google were around one million in 1998, but quickly reached 1 billion in 2000 and have already exceeded 1 trillion in 2008. This rapid expansion is accelerated by the dramatic increase in acceptance of social networking applications, such as Facebook, Twitter, Weibo, etc., that allow users to create contents freely and amplify the already huge Web volume. Furthermore, with mobile phones becoming the sensory gateway to get real-time data on people from different aspects, the vast amount of data that mobile carrier can potentially process to improve our daily life has significantly outpaced our past CDR (call data record)-based processing for billing purposes only. It can be foreseen that Internet of things (IoT) applications will raise the scale of data to an unprecedented level. People and devices (from home coffee machines to cars, to buses, railway stations and airports) are all loosely connected. Trillions of such connected components will generate a huge data ocean, and valuable information must be discovered from the data to help improve quality of life and make our world a better place. For example, after we get up every morning, in order to optimize our commute time to work and complete the optimization before we arrive at office, the system needs to process information from traffic, weather, construction, police activities to our calendar schedules, and perform deep optimization under the tight time constraints. In all these applications, we are facing significant challenges in leveraging the vast amount of data, including challenges in (1) system capabilities (2) algorithmic design (3) business models. This workshop aims to bring together people from both academia and industry to present their most recent work related to these big-data issues, and exchange ideas and thoughts in order to advance this big-data challenge, which has been considered as one of the most exciting opportunities in the past 10 years. Wei Fan, Albert Bifet, Qiang Yang and Philip Yu BigMine 2013 Program co-Chairs http://bigdata-mining.org/

	Invited Speakers Return to Top Christos Faloutsos, Professor at Carnegie Mellon University. Title: Large Graph Mining – Patterns, tools and cascade analysis Abstract: What do graphs look like? How do they evolve over time? How does influence/news/viruses propagate, over time? We present a long list of static and temporal laws, and some recent observations on real graphs. For tools, we present an overview of the PEGASUS system which is designed for handling Billion-node graphs, running on top of the “hadoop” system. Finally, for cascades and propagation, we show how to measure the connectivity of a graph, and how to achieve near-optimal immunization, to slow down virus propagation. Jiawei Han, Abel Bliss Professor of Computer Science, University of  Illinois at Urbana-Champaign. Title: Challenging Problems for Scalable Mining of Heterogeneous Social and Information Networks Abstract: In today’s interconnected real world, social and informational entities are interconnected, forming gigantic, interconnected, integrated social and information networks. By structuring these data objects into multiple types, such networks become semi-structured heterogeneous social and information networks. Most real world applications that handle big data, including interconnected social media and social networks, medical information systems, online e-commerce systems, or database systems, can be structured into typed, heterogeneous social and information networks. For example, in a medical care network, objects of multiple types, such as patients, doctors, diseases, medication, and links such as visits, diagnosis, and treatments are intertwined together, providing rich information and forming heterogeneous information networks. Effective analysis of large-scale heterogeneous social and information networks poses an interesting but critical challenge. In this talk, we present a set of data mining scenarios in heterogeneous social and information networks and show that mining typed, heterogeneous networks is a new and promising research frontier in data mining research. However, such mining may raise some serious challenging problems on scalability computation. We identify a set of problems on scalable computation and calls for serious studies on such problems. This includes how to efficiently computation for (1) meta path-based similarity search, (2) rank-based clustering, (3) rank-based classification, (4) meta path-based link/relationship prediction, and (5) topical hierarchies from heterogeneous information networks. We introduce some recent efforts, discuss the trade-offs between query-independent pre-computation vs. query-dependent online computation, and point out some promising research directions. Hong Cheng, Assistant Professor at the Chinese University of Hong  Kong. Title: Processing Reachability Queries with Realistic Constraints on Massive Networks Abstract: Massive graphs are ubiquitous in various application domains, such as social networks, road networks, communication networks, biological networks, RDF graphs, and so on. Such graphs are massive (for example, with hundreds of millions of nodes and edges or even more) and contain rich information (for example, node/edge weights, labels and textual contents). In such massive graphs, an important class of problems is to process various graph structure related queries. Graph reachability, as an example, asks whether a node can reach another in a graph. However, the large graph scale presents new challenges for efficient query processing. In this talk, I will introduce two new yet important types of graph reachability queries: weight constraint reachability that imposes edge weight constraint on the answer path, and k-hop reachability that imposes a length constraint on the answer path. With such realistic constraints, we can find more meaningful and practically feasible answers. These two reachablity queries have wide applications in many real-world problems, such as QoS routing and trip planning. Xavier Amatriain, Director of Personalization Science and Engineering, Netflix.  Title: Big & Personal: the data and the models behind Netflix Abstract: Since the Netflix $1 million Prize, announced in 2006, our company has been known for having personalization at the core of our product. Even at that point in time, the dataset that we released was considered “large”, and we stirred innovation in the (Big) Data Mining research field. Our current product offering is now focused around instant video streaming, and our data is now many orders of magnitude larger. Not only do we have many more users in many more countries, but we also receive many more streams of data. Besides the ratings, we now also use information such as what our members play, browse, or search. In this talk I will discuss the different approaches we follow to deal with these large streams of data in order to extract information for personalizing our service. I will describe some of the machine learning models used, as well as the architectures that allow us to combine complex offline batch processes with real-time data streams.

	Table of Contents Return to Top Full Papers Invited Talk: Big & Personal: data and models behind Netflix recommendations Xavier Amatriain, Netflix Soft-CsGDT: Soft Cost-sensitive Gaussian Decision Tree for Cost-sensitive Classification of Data Streams Ning Guo, Yanhua Yu, Meina Song, Junde Song and Yu Fu. Searching time series with Hadoop in an electric power company Alice Bérard and Georges Hebrail. Long-memory time series ensembles for concept shift detection Marcelo Mendoza, Felipe Bravo-Márquez, Bárbara Poblete and Daniel Gayo-Avello. Estimating Building Simulation Parameters via Bayesian Structure Learning Richard Edwards, Joshua New and Lynne Parker. Solving Combinatorial Optimization Problems using Relaxed Linear Programming: A High Performance Computing Perspective Chen Jin, Qiang Fu, Huahua Wang, Ankit Agrawal, William Hendrix, Wei-Keng Liao, Mostofa Patwary, Arindam Banerjee and Alok Choudhary. CAPRI: A Tool for Mining Complex Line Patterns in Large Log Data Farhana Zulkernine, Patrick Martin, Wendy Powley, Sima Soltani, Serge Mankovksi and Mark Addleman. Direct Out-of-Memory Distributed Parallel Frequent Pattern Mining Zheyi Rong and Jeroen De Knijf. TV Predictor: Personalized Program Recommendations to be displayed on SmartTVs Christopher Krauss, Lars George and Stefan Arbanowski. Data-driven Study of Urban Infrastructure to Enable City-wide Ubiquitous Computing Gautam S. Thakur, Pan Hui and Ahmed Helmy. Pushing Constraints into Data Streams Andreia Silva and Claudia Antunes. Forecasting Building Occupancy Using Sensor Network Data James Howard and William Hoff. Maintaining connected components for infinite graph streams Jonathan Berry, Matthew Oster, Cynthia Phillips, Steven Plimpton and Timothy Shead. An Architecture for Detecting Events in Real-Time using Massive Heterogeneous Data Sources George Valkanas, Dimitrios Gunopulos, Ioannis Boutsis and Vana Kalogeraki.
	Organizers Return to Top Workshop Chairs Wei Fan Huawei Noah’s Ark Lab E-mail: wei.fan at gmail.com Albert Bifet Yahoo! Research Barcelona E-mail: abifet at cs.waikato.ac.nz Qiang Yang Huawei Noah’s Ark Lab E-mail: qiang.yang at huawei.com Philip Yu University of Illinois at Chicago E-mail: psyu at cs.uic.edu Organizers Albert Bifet, Yahoo! Research Barcelona Wei Fan, Huawei Noah’s Ark Lab Jing Gao, University at Buffalo Le Gruenwald, National Science Foundation, University of Oklahoma Dimitrios Gunopulos, University of Athens Geoff Holmes, University of Waikato Latifur Khan, University of Texas at Dallas Dekang Lin, Google Deepak Turaga, IBM T.J. Watson Research Qiang Yang, Huawei Noah’s Ark Lab Philip Yu, University of Illinois at Chicago Kun Zhang, Xavier University of Louisiana Xiatian Zhang, Tencent Yuanchun Zhou, Chinese Academy of Sciences Poster and Reception Chairs Xia Tian Zhang, Tencent Xian Wu, Microsoft Publicity Chairs Albert Bifet, Yahoo! Research Barcelona Erheng Zhong, Hong University of Science of Technology Treasury Xiaoxiao Shi, University of Illinois at Chicago Jing Gao, SUNY Buffalo Program Committee Vassilis Athitsos, University of Texas at Arlington Roberto Bayardo, Google Francesco Bonchi, Yahoo! Research Barcelona Liangliang Cao, IBM Hong Cheng, The Chinese University of Hong Kong Alfredo Cuzzocrea, ICAR-CNR & University of Calabria Ian Davidson, SUNY Nan Du, Georgia Institute of Technology Joao Gama, University Porto Fosca Giannotti, ISTI-CNR Aristides Gionis, Yahoo! Research Barcelona Bart Goethals, University of Antwerp Jiawei Han, University of Illinois at Urbana-Champaign Marwan Hassani, Aachen University Steven C.H. Hoi, Nanyang Technological University Siddhartha Jonnalagadda, Mayo Clinic Murat Kantarcioglu, University of Texas at Dallas George Karypis, University of Minnesota Steve Ko, SUNY at Buffalo Vipin Kumar, University of Minnesota, Twin Cities Jianhui Li, Computer Network Information Center,Chinese Academy of Sciences Cindy Xide Lin, University of Illinois at Urbana-Champaign Shou-De Lin, National Taiwan University Michael May, Fraunhofer IAIS Themis Palpanas, University of Trento Bernhard Pfahringer, University of Waikato Jesse Read, Universidad Carlos III Chandan K. Reddy, Wayne State University Cyrus Shahabi, USC Ashok Srivastava, NASA Jian-Tao Sun, Microsoft Research Asia Jie Tang, Tsinghua University Hanghang Tong, Carnegie Mellon University Haifeng Wang, Baidu Bo Wang, Nanjing University of Aeronautics & Astronautics Yi Wang, Tencent Xian Wu, Microsoft Tian Wu, Baidu Zhenghua Xue, Chinese Academy of Sciences Gui-Rong Xue, Shanghai Jiao Tong University Xifeng Yan, University of California at Santa Barbara Rong Yan, Facebook Aden Yuen, Tencent Demetris Zeinalipour, University of Cyprus Xingquan Zhu, University of Technology, Sydney
	Sponsors Return to Top